find and count partially identical sublist

*To*: mathgroup at smc.vnet.net*Subject*: [mg85740] find and count partially identical sublist*From*: markus.roellig at googlemail.com*Date*: Wed, 20 Feb 2008 07:04:35 -0500 (EST)

Hello group, I am trying to find and count sublists that are partially identical to each other and then modify parts of this sublist with the multiplicity. It's easier to understand if I give an example. Say I have an array (strings and numbers mixed) like: {{"B", "A", 0, 1}, {"A", "B", 6, 1}, {"B", "A", 4, 1}, {"B", "A", 4, 1}, {"A", "B", 1, 1}, {"B", "A", 5, 1}, {"B", "A", 2, 1}, {"A", "B", 10, 1}} I need to find successive sublists which have the same first two elements (here {3,4} and {7,6}). Depending on how many repetitions occur I want to divide the 4th element of each sublist by the number of repetitions. In the example the result would be: {{"B", "A", 0, 1}, {"A", "B", 6, 1}, {"B", "A", 4, 1/2}, {"B", "A", 4, 1/2}, {"A", "B", 1, 1}, {"B", "A", 5, 1/2}, {"B", "A", 2, 1/ 2}, {"A", "B", 10, 1}} The code I came up with is: tst = Table[{RandomChoice[{"A", "B"}], RandomChoice[{"A", "B"}], RandomInteger[{0, 10}], 1}, {i, 1, 30}]; tstSplt = Split[tst, #1[[1 ;; 2]] === #2[[1 ;; 2]] &] // MatrixForm tab = Table[tstSplt[[1, i]] // Length, {i, 1, Length[tstSplt[[1]]]}] rpl = MapThread[#1[[All, 4]]/#2 &, {tstSplt[[1, All]], tab}] // Flatten tst[[All, 4]] = tst[[#, 4]] & @@@ rpl; tst This works, but I am somewhat concerned with run speed (my actual array is much larger, roughly 50000x20). And I have the feeling that I am wasting too much memory. One additional comment: The above code only finds successive duplicates. How would I have to modify it to find all occurences ? Best regards Markus Roellig I.Physikalisches Institut der Universit=E4t zu K=F6ln Z=FClpicher Strasse 77 D-50937 K=F6ln Tel.: +49-221-470-3547 Fax : +49-221-470-5162