Services & Resources / Wolfram Forums / MathGroup Archive
-----

MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Extracting some elements, members of another list

  • To: mathgroup at smc.vnet.net
  • Subject: [mg112502] Re: Extracting some elements, members of another list
  • From: Leonid Shifrin <lshifr at gmail.com>
  • Date: Fri, 17 Sep 2010 06:43:51 -0400 (EDT)

Ray,

I have no good explanation for this. I can confirm most of your
observations, except
that in my case the timing stabilizes around 8 seconds, which is about 2x of
the first
(fresh-kernel) result. I suspect that this has to do with some shared memory
(references).
In particular, clearing history like so:

Unprotect[Out];
DownValues[Out] = {};
Protect[Out];

Seems to help a little.

The other thing which comes to mind is that this could be due to some
internals of Dispatch.
When we use Dispatch, some hash-table is created internally. When we abandon
a local
variable which stored the Dispatched rules, presumably the hash table has to
also be destroyed.
 Question is, when does this happen,  and in case if it is not destroyed
timely, would it slow
down the operation of another Dispatch in some way. I did not observe such
behavior before,
but here Dispatch was used on a pretty large list, so may be this effect is
only visible for
large sets of data. OTOH, my code using Dispatch only:

In[28]:= list21/.Dispatch[Thread[#[[All,1]]->#]&[list11]]//Short//Timing
Out[28]=
{3.312,{{2707844,data121541},{4171257,data402011},<<499996>>,{9613764,data217507},{6452511,data971822}}}

Does not seem to suffer much from this problem, so it may be due to
the entanglement of
Dispatch and Module removing its local variable <tag> from the symbol table.

Yet another reason is that the way I generate (model) data, I create 1000000
symbols in the
Global` context. This seems to complicate the symbol lookup. If, for the
sake of example,
I modify the memberPositions function as:

Clear[memberPositions];
memberPositions[x_List, y_List] :=
 Pick[Range[Length[x]],
  Replace[x, Dispatch[Thread[Rule[Intersection[x, y], "tag"]]], {1}],
  "tag"]

and also generate the data as strings rather than symbols:

list11 = Transpose[{RandomInteger[{1000000, 9999999}, 1000000],
    Table["data" <> ToString[i], {i, 1, 1000000}]}];

list21 = RandomSample[list11[[All, 1]], 500000];

Then the problem is much less pronounced (you must re-start the kernel to
see it).
I suspect that with so many symbols in the Global`, some kind of re-hashing
may
happen as a result of running memberPositions (since it has a local variable
<tag>,
 which temporarily enters the symbol table), and possibly these re-hashs can
accumulate
and lead to the overall slow-down.

Anyways, I subscribe to your question and hope that someone will
enlighten us.

Regards,
Leonid


On Thu, Sep 16, 2010 at 2:01 PM, Ray Koopman <koopman at sfu.ca> wrote:

> On Sep 15, 1:38 am, Leonid Shifrin <lsh... at gmail.com> wrote:
> > Hi,
> >
> > The following function:
> >
> > Clear[memberPositions];
> > memberPositions[x_List, y_List] :=
> >  Module[{tag},
> >   Pick[Range[Length[x]],
> >    Replace[x, Dispatch[Thread[Rule[Intersection[x, y], tag]]], {1}],
> >    tag]]
> >
> > Is a fast way to find positions in list <x> which are also in <y>, for
> > arbitrary lists <x> and <y>. Using it, it worked in about 5 seconds on my
> > machine (M7.0 32bit WinXP AMD Phenom II - 2800 Ghz). Here is how you do
> it:
> >
> > Model your lists:
> >
> > In[16]:= list1 = Transpose[
> > {RandomInteger[{1000000,9999999},1000000],
> >  Table[ToExpression["data"<>ToString[i]],{i,1,1000000}]}];
> >
> > In[17]:= list2 = RandomSample[list1[[All,1]],500000];
> >
> > Get the result:
> >
> > In[18]:= list1[[memberPositions[list1[[All,1]],list2]]]//Short//Timing
> > Out[18]= {5.203,{{9506759,data1},{3854419,data2},{1093656,data3},
> >           <<526944>>,{6991722,data999997},{9278309,data1000000}}}
>
> This is very quick the first time thru, but each successive call
> slows by an increasing amount. Here are 5 sequential sample times
> for the same two input lists: {4.84, 5.34, 6.39, 7.21, 8.95}.
> What's causing the slowdown, and how can it be fixed?
>
>



  • Prev by Date: Re: Inconsistent behaviour of Integrate
  • Next by Date: Can somebody integrate this function ?
  • Previous by thread: Re: Extracting some elements, members of another list
  • Next by thread: PolarPlot3D?