Re: Find Position of many elements in a large list.
- To: mathgroup at smc.vnet.net
- Subject: [mg127686] Re: Find Position of many elements in a large list.
- From: Roland Franzius <roland.franzius at uos.de>
- Date: Wed, 15 Aug 2012 03:32:45 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- Delivered-to: l-mathgroup@wolfram.com
- Delivered-to: mathgroup-newout@smc.vnet.net
- Delivered-to: mathgroup-newsend@smc.vnet.net
- References: <k0d4bu$m8f$1@smc.vnet.net>
Am 14.08.2012 11:05, schrieb benp84 at gmail.com:
> I have a sorted, 1-dimensional list X of 1,000,000 integers, and a sorted, 1-dimensional list Y of 10,000 integers. Most, but not all, of the elements of Y are also elements of X. I'd like to know the positions of the elements in X that are also in Y. What's the fastest way to compute this?
>
> I have an algorithm in mind but it requires lots of custom code and I'm wondering if there's a clever way to do it with built-in functions. Thanks.
>
You dont need any "code" at all.
Condsider this example executed on an i7 quad core notebook
data = RandomInteger[10000000, 500000];
pattern = RandomInteger[10000000, 10000];
patterninuse = Intersection[data, pattern];
Timing[pointers = Position[data, #] & /@ patterninuse;]
{17.471999999999998`, Null}
Now we Parallize by splitting the target list in two.
Four kernels get launched
{rg1, rg2} = Partition[data, Length[data]/2];
t1 = AbsoluteTime[];
Parallelize[
pt1 = Position[rg1, #] & /@ patterninuse;
pt2 = Position[rg2, #] & /@ patterninuse;
pt = Join[pt1, pt2 + Length[pt1]];
]
AbsoluteTime[] - t1
4.7736084`8.130391782549879
Timing[] with Parallelize[] doesnt work. Splitting a list of 1000000 in
four partitions (plus a little error r3 instead rg3) sends the machine
into nowhere.
Since memory reaches the maximum of 8 Gb and after some minutes even
mouse interrupt is freezing and screen becomes black there may be a
parallel problem using the graphic processor.
--
Roland Franzius