Re: RE: final results: creating adjacency matrices
- To: mathgroup at smc.vnet.net
- Subject: [mg36968] Re: [mg36934] RE: [mg36584] final results: creating adjacency matrices
- From: Daniel Lichtblau <danl at wolfram.com>
- Date: Thu, 3 Oct 2002 00:17:35 -0400 (EDT)
- References: <200210020732.DAA20749@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
"Moliterno, Thomas" wrote: > > First thanks to all, and in particular Bobby Treat, for your help with > this question. > > The best solution was as follows: > > lst = ReadList["c:\\data.txt", {Number, Number}] > adjacenceMatrix[ > x:{{_, _}..}] := Module[{actors, events}, > {actors, events} = Union /@ Transpose[x]; > Array[If[MemberQ[x, {actors[[#1]], events[[#2]]}], 1, 0] & , > {Length[actors], Length[events]}]] > > a = adjacenceMatrix[lst]; > b = a . Transpose[a]; > c = b (1 - IdentityMatrix[Length[b]]) > > C is the desired symmetric matrix with off diagonal values of >=0, > indicating the number of times two actors participate in the same event. > The diagonal is set to 0. > > A few items in response to Bobby's message, below. While c is, in fact, > a huge matrix with lots of cells equal to zero, that is exactly how we > need it structured for our analysis and research question (not relevant > to the list, but I'd be happy to discuss off list). Processing time is > actually not too bad!! I'm running a PIII 900 with 512 SDRAM, and the > code ran a 177 x 3669 matrix in under 90 seconds. MatrixForm [c] > presented no problems in viewing in the front end, but then it's only > 177 x 177. > > Thanks again to all, > > Tom > > ********************************************** > Thomas P. Moliterno > Graduate School of Management > University of California, Irvine > tmoliter at uci.edu > ********************************************** > [...] There are several ways to go about this and which is best will vary based on relative number of events vs. number of actors. Below I show three variations. The first is a minor recoding of the one above. The second iterates over all pairs of actors. The third looks at all events for common actors. I then show three examples. The first two methods have the advantage that they do not require that events be positive integers. With some extra work the third method could also get around this restriction. toAdjacency0[data:{{_, _}..}] := Module[ {actors, events, mat1, mat2}, {actors, events} = Union /@ Transpose[data]; mat1 = Array[If[MemberQ[data, {actors[[#1]], events[[#2]]}], 1, 0] & , {Length[actors], Length[events]}]; mat2 = mat1 . Transpose[mat1]; mat2 * (1-IdentityMatrix[Length[mat2]]) ] toAdjacency1[origdata_] := Module[ {data=Union[origdata], mat}, data = Map[Last, Split[data,#1[[1]]===#2[[1]]&], {2}]; mat = Table [If [j>k, Length[Intersection[data[[j]],data[[k]]]], 0], {j,Length[data]}, {k,Length[data]}]; mat+Transpose[mat] ] toAdjacency2[origdata_] := Module[ {data=Sort[Map[Reverse,Union[origdata]]], mat, len, event}, data = Map[Last, Split[data,#1[[1]]===#2[[1]]&], {2}]; dim = Length[Union[Flatten[data]]]; len = Length[data]; mat = Table[0, {dim}, {dim}]; Do [ event = data[[j]]; Do [ Do [ mat[[event[[m]],event[[k]]]] += 1; mat[[event[[k]],event[[m]]]] += 1, {m,k-1}], {k,Length[event]}], {j,len}]; mat ] data1 = Table[{Random[Integer,{1,1000}], Random[Integer,50]}, {10000}]; data2 = Table[{Random[Integer,{1,1000}], Random[Integer,100]}, {10000}]; data3 = Table[{Random[Integer,{1,1000}], Random[Integer,200]}, {10000}]; Timings are on a 1.5 GHz machine running the Mathematica 4.2 kernel under Linux. In[107]:= Timing[m0 = toAdjacency0[data1];] Out[107]= {5.44 Second, Null} In[108]:= Timing[m1 = toAdjacency1[data1];] Out[108]= {10.5 Second, Null} In[109]:= Timing[m2 = toAdjacency2[data1];] Out[109]= {16.24 Second, Null} In[110]:= m0 === m1 === m2 Out[110]= True Note that for this example the result is not terrible sparse (less than 20%). In[112]:= Count[Flatten[m0], 0] Out[112]= 191374 In[115]:= Timing[m0 = toAdjacency0[data2];] Out[115]= {11.51 Second, Null} In[116]:= Timing[m1 = toAdjacency1[data2];] Out[116]= {10.92 Second, Null} In[117]:= Timing[m2 = toAdjacency2[data2];] Out[117]= {9.07 Second, Null} Curiously this was the first example I tried, and all three methods perform about the same in this case. The result, not suprisingly, is more sparse (40%) because we have the same number of actors and pairs as previously, but now with more events to spread out over the pairs. In[118]:= Count[Flatten[m0], 0] Out[118]= 403232 When we get sparser still, the third method begins to dominate and the first is relatively slower. In[119]:= Timing[m0 = toAdjacency0[data3];] Out[119]= {22.73 Second, Null} In[120]:= Timing[m1 = toAdjacency1[data3];] Out[120]= {10.88 Second, Null} In[121]:= Timing[m2 = toAdjacency2[data3];] Out[121]= {4.96 Second, Null} Now sparsity is over 60%. In[122]:= Count[Flatten[m0], 0] Out[122]= 624350 The relative speed of this last method, in this instance, is derived from the fact that individual event lists are on average half the size of the previous case. Hence the main loop is expected to improve on average by a factor of 2 (you get a factor of 4 for iterating over all pairs in each event, but lose a factor of 2 because there are twice as many event lists). My guess is that a preprocessor that assesses number of actors vs. number of events would be the best way to choose between the first and third methods (which, inexplicably, are labelled as methods 0 and 2). It is not clear to me whether the middle approach will ever dominate. I have not given much thought to concocting examples where it would because offhand I suspect they would be pathological as in dense and with large intersections. As a last remark I'll note that these might run significantly faster if coded with Compile. Whether that is viable depends on the form of the data. In the above example where everything is a machine integer that approach would certainly work. Daniel Lichtblau Wolfram Research
- References:
- RE: final results: creating adjacency matrices
- From: "Moliterno, Thomas" <TMoliter@gsm.uci.edu>
- RE: final results: creating adjacency matrices