Re: find and count partially identical sublist
- To: mathgroup at smc.vnet.net
- Subject: [mg85786] Re: find and count partially identical sublist
- From: dh <dh at metrohm.ch>
- Date: Fri, 22 Feb 2008 05:03:05 -0500 (EST)
- References: <fphd98$8mq$1@smc.vnet.net>
Hi Markus, try this: dat={{"B","A",0,1},{"A","B",6,1},{"B","A",4,1},{"B","A",4,1},{"A","B",1,1},{"B","A",5,1},{"B","A",2,1},{"A","B",10,1}}; Flatten[ (#/. x:{x1_,x2_,x3_,x4_?NumericQ}->{x1,x2,x3,x4/Length[#]})&/@ Split[dat,#1[[1]]==#2[[1]]&[[2]]==#2[[2]]&] ,1] hope this helps, Daniel markus.roellig at googlemail.com wrote: > Hello group, > > I am trying to find and count sublists that are partially identical to > each other and then modify parts of this sublist with the > multiplicity. It's easier to understand if I give an example. > > Say I have an array (strings and numbers mixed) like: > > {{"B", "A", 0, 1}, {"A", "B", 6, 1}, {"B", "A", 4, 1}, {"B", "A", 4, > 1}, {"A", "B", 1, 1}, {"B", "A", 5, 1}, {"B", "A", 2, 1}, {"A", "B", > 10, 1}} > > I need to find successive sublists which have the same first two > elements (here {3,4} and {7,6}). Depending on > how many repetitions occur I want to divide the 4th element of each > sublist by the number of repetitions. In the example the result would > be: > > {{"B", "A", 0, 1}, {"A", "B", 6, 1}, {"B", "A", 4, 1/2}, {"B", "A", 4, > 1/2}, {"A", "B", 1, 1}, {"B", "A", 5, 1/2}, {"B", "A", 2, 1/ > 2}, {"A", "B", 10, 1}} > > The code I came up with is: > > > tst = Table[{RandomChoice[{"A", "B"}], RandomChoice[{"A", "B"}], > RandomInteger[{0, 10}], 1}, {i, 1, 30}]; > tstSplt = Split[tst, #1[[1 ;; 2]] === #2[[1 ;; 2]] &] // MatrixForm > tab = Table[tstSplt[[1, i]] // Length, {i, 1, Length[tstSplt[[1]]]}] > rpl = MapThread[#1[[All, 4]]/#2 &, {tstSplt[[1, All]], tab}] // > Flatten > tst[[All, 4]] = tst[[#, 4]] & @@@ rpl; > tst > > > This works, but I am somewhat concerned with run speed (my actual > array is much larger, roughly 50000x20). And I have the feeling that I > am wasting too much memory. > > > One additional comment: The above code only finds successive > duplicates. How would I have to modify it to find all occurences ? > > Best regards > > > Markus Roellig > > I.Physikalisches Institut der > Universit=E4t zu K=F6ln > Z=FClpicher Strasse 77 > D-50937 K=F6ln > Tel.: +49-221-470-3547 > Fax : +49-221-470-5162 >