Re: find and count partially identical sublist
- To: mathgroup at smc.vnet.net
- Subject: [mg85758] Re: find and count partially identical sublist
- From: "Steve Luttrell" <steve at _removemefirst_luttrell.org.uk>
- Date: Thu, 21 Feb 2008 17:59:24 -0500 (EST)
- References: <fphd98$8mq$1@smc.vnet.net>
Define the list to be processed. In[1]:= data={{"B","A",0,1},{"A","B",6,1},{"B","A",4,1},{"B","A",4,1},{"A","B",1,1},{"B","A",5,1},{"B","A",2,1},{"A","B",10,1}} Out[1]= {{B,A,0,1},{A,B,6,1},{B,A,4,1},{B,A,4,1},{A,B,1,1},{B,A,5,1},{B,A,2,1},{A,B,10,1}} Split the list into sublists containing runs of identical sublists (defined according to the stated criterion). In[2]:= data2=Split[data,Take[#1,2]==Take[#2,2]&] Out[2]= {{{B,A,0,1}},{{A,B,6,1}},{{B,A,4,1},{B,A,4,1}},{{A,B,1,1}},{{B,A,5,1},{B,A,2,1}},{{A,B,10,1}}} Determine the lengths of the identical runs. In[3]:= length2=Map[Length,data2] Out[3]= {1,1,2,1,2,1} Replace each sublist by an appropriately scaled version. The pattern {x1_,x2_,x3_,y_}?VectorQ matches a list of length 4 each of whose elements is not itself a list. In[4]:= data3=MapThread[#1/.{x1_,x2_,x3_,y_}?VectorQ->{x1,x2,x3,y/#2}&,{data2,length2}] Out[4]= {{{B,A,0,1}},{{A,B,6,1}},{{B,A,4,1/2},{B,A,4,1/2}},{{A,B,1,1}},{{B,A,5,1/2},{B,A,2,1/2}},{{A,B,10,1}}} Stephen Luttrell West Malvern, UK <markus.roellig at googlemail.com> wrote in message news:fphd98$8mq$1 at smc.vnet.net... > Hello group, > > I am trying to find and count sublists that are partially identical to > each other and then modify parts of this sublist with the > multiplicity. It's easier to understand if I give an example. > > Say I have an array (strings and numbers mixed) like: > > {{"B", "A", 0, 1}, {"A", "B", 6, 1}, {"B", "A", 4, 1}, {"B", "A", 4, > 1}, {"A", "B", 1, 1}, {"B", "A", 5, 1}, {"B", "A", 2, 1}, {"A", "B", > 10, 1}} > > I need to find successive sublists which have the same first two > elements (here {3,4} and {7,6}). Depending on > how many repetitions occur I want to divide the 4th element of each > sublist by the number of repetitions. In the example the result would > be: > > {{"B", "A", 0, 1}, {"A", "B", 6, 1}, {"B", "A", 4, 1/2}, {"B", "A", 4, > 1/2}, {"A", "B", 1, 1}, {"B", "A", 5, 1/2}, {"B", "A", 2, 1/ > 2}, {"A", "B", 10, 1}} > > The code I came up with is: > > > tst = Table[{RandomChoice[{"A", "B"}], RandomChoice[{"A", "B"}], > RandomInteger[{0, 10}], 1}, {i, 1, 30}]; > tstSplt = Split[tst, #1[[1 ;; 2]] === #2[[1 ;; 2]] &] // MatrixForm > tab = Table[tstSplt[[1, i]] // Length, {i, 1, Length[tstSplt[[1]]]}] > rpl = MapThread[#1[[All, 4]]/#2 &, {tstSplt[[1, All]], tab}] // > Flatten > tst[[All, 4]] = tst[[#, 4]] & @@@ rpl; > tst > > > This works, but I am somewhat concerned with run speed (my actual > array is much larger, roughly 50000x20). And I have the feeling that I > am wasting too much memory. > > > One additional comment: The above code only finds successive > duplicates. How would I have to modify it to find all occurences ? > > Best regards > > > Markus Roellig > > I.Physikalisches Institut der > Universit=E4t zu K=F6ln > Z=FClpicher Strasse 77 > D-50937 K=F6ln > Tel.: +49-221-470-3547 > Fax : +49-221-470-5162 >