MathGroup Archive 2002

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: RE: final results: creating adjacency matrices

"Moliterno, Thomas" wrote:
> First thanks to all, and in particular Bobby Treat, for your help with
> this question.
> The best solution was as follows:
> lst = ReadList["c:\\data.txt", {Number, Number}]
> adjacenceMatrix[
>    x:{{_, _}..}] := Module[{actors, events},
>       {actors, events} = Union /@ Transpose[x];
>     Array[If[MemberQ[x, {actors[[#1]], events[[#2]]}], 1, 0] & ,
>      {Length[actors], Length[events]}]]
> a = adjacenceMatrix[lst];
> b = a . Transpose[a];
> c = b (1 - IdentityMatrix[Length[b]])
> C is the desired symmetric matrix with off diagonal values of >=0,
> indicating the number of times two actors participate in the same event.
> The diagonal is set to 0.
> A few items in response to Bobby's message, below.  While c is, in fact,
> a huge matrix with lots of cells equal to zero, that is exactly how we
> need it structured for our analysis and research question (not relevant
> to the list, but I'd be happy to discuss off list).  Processing time is
> actually not too bad!!  I'm running a PIII 900 with 512 SDRAM, and the
> code ran a 177 x 3669 matrix in under 90 seconds.  MatrixForm [c]
> presented no problems in viewing in the front end, but then it's only
> 177 x 177.
> Thanks again to all,
> Tom
> **********************************************
> Thomas P. Moliterno
> Graduate School of Management
> University of California, Irvine
> tmoliter at
> **********************************************
> [...]

There are several ways to go about this and which is best will vary
based on relative number of events vs. number of actors. Below I show
three variations. The first is a minor recoding of the one above. The
second iterates over all pairs of actors. The third looks at all events
for common actors. I then show three examples. The first two methods
have the advantage that they do not require that events be positive
integers. With some extra work the third method could also get around
this restriction.

toAdjacency0[data:{{_, _}..}] := Module[
	{actors, events, mat1, mat2},
	{actors, events} = Union /@ Transpose[data];
    mat1 = Array[If[MemberQ[data, {actors[[#1]], events[[#2]]}], 1, 0] &
     {Length[actors], Length[events]}];
	mat2 = mat1 . Transpose[mat1];
	mat2 * (1-IdentityMatrix[Length[mat2]])

toAdjacency1[origdata_] := Module[
	{data=Union[origdata], mat},
	data = Map[Last, Split[data,#1[[1]]===#2[[1]]&], {2}];
	mat = Table [If [j>k, Length[Intersection[data[[j]],data[[k]]]], 0],
		{j,Length[data]}, {k,Length[data]}];

toAdjacency2[origdata_] := Module[
	{data=Sort[Map[Reverse,Union[origdata]]], mat, len, event},
	data = Map[Last, Split[data,#1[[1]]===#2[[1]]&], {2}];
	dim = Length[Union[Flatten[data]]];
	len = Length[data];
	mat = Table[0, {dim}, {dim}];
	Do [
		event = data[[j]];
		Do [
			Do [
				mat[[event[[m]],event[[k]]]] += 1;
				mat[[event[[k]],event[[m]]]] += 1,

data1 = Table[{Random[Integer,{1,1000}], Random[Integer,50]}, {10000}];
data2 = Table[{Random[Integer,{1,1000}], Random[Integer,100]}, {10000}];
data3 = Table[{Random[Integer,{1,1000}], Random[Integer,200]}, {10000}];

Timings are on a 1.5 GHz machine running the Mathematica 4.2 kernel
under Linux.

In[107]:= Timing[m0 = toAdjacency0[data1];]
Out[107]= {5.44 Second, Null}

In[108]:= Timing[m1 = toAdjacency1[data1];]
Out[108]= {10.5 Second, Null}

In[109]:= Timing[m2 = toAdjacency2[data1];]
Out[109]= {16.24 Second, Null}

In[110]:= m0 === m1 === m2
Out[110]= True

Note that for this example the result is not terrible sparse (less than

In[112]:= Count[Flatten[m0], 0]
Out[112]= 191374

In[115]:= Timing[m0 = toAdjacency0[data2];]
Out[115]= {11.51 Second, Null}

In[116]:= Timing[m1 = toAdjacency1[data2];]
Out[116]= {10.92 Second, Null}

In[117]:= Timing[m2 = toAdjacency2[data2];]
Out[117]= {9.07 Second, Null}

Curiously this was the first example I tried, and all three methods
perform about the same in this case. The result, not suprisingly, is
more sparse (40%) because we have the same number of actors and pairs as
previously, but now with more events to spread out over the pairs.

In[118]:= Count[Flatten[m0], 0]
Out[118]= 403232

When we get sparser still, the third method begins to dominate and the
first is relatively slower.

In[119]:= Timing[m0 = toAdjacency0[data3];]
Out[119]= {22.73 Second, Null}

In[120]:= Timing[m1 = toAdjacency1[data3];]
Out[120]= {10.88 Second, Null}

In[121]:= Timing[m2 = toAdjacency2[data3];]
Out[121]= {4.96 Second, Null}

Now sparsity is over 60%.

In[122]:= Count[Flatten[m0], 0]
Out[122]= 624350

The relative speed of this last method, in this instance, is derived
from the fact that individual event lists are on average half the size
of the previous case. Hence the main loop is expected to improve on
average by a factor of 2 (you get a factor of 4 for iterating over all
pairs in each event, but lose a factor of 2 because there are twice as
many event lists).

My guess is that a preprocessor that assesses number of actors vs.
number of events would be the best way to choose between the first and
third methods (which, inexplicably, are labelled as methods 0 and 2). It
is not clear to me whether the middle approach will ever dominate. I
have not given much thought to concocting examples where it would
because offhand I suspect they would be pathological as in dense and
with large intersections.

As a last remark I'll note that these might run significantly faster if
coded with Compile. Whether that is viable depends on the form of the
data. In the above example where everything is a machine integer that
approach would certainly work.

Daniel Lichtblau
Wolfram Research

  • Prev by Date: Re: Re: Accuracy and Precision
  • Next by Date: Re: Loss of precision when using Simplify
  • Previous by thread: RE: final results: creating adjacency matrices
  • Next by thread: Inequality solving