MathGroup Archive: January 2005 [00109]

[Date Index] [Thread Index] [Author Index]
Re: Re: Re: Re: Re: Slowdown
To: mathgroup at smc.vnet.net
Subject: [mg53328] Re: [mg53309] Re: [mg53278] Re: [mg53254] Re: [mg53241] Re: Slowdown
From: János <janos.lobb at yale.edu>
Date: Thu, 6 Jan 2005 02:52:03 -0500 (EST)
References: <cr33tt$je6$1@smc.vnet.net> <200501020912.EAA27775@smc.vnet.net> <200501030929.EAA10649@smc.vnet.net> <C570DAF1-5D86-11D9-8B6F-000A95B4967A@mimuw.edu.pl> <200501040813.DAA27157@smc.vnet.net> <opsj294nmciz9bcq@monster.ma.dl.cox.net> <200501050621.BAA11685@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com
 From the "mouth of the horse" - NKS page 622:

<snip>
But to make this work, does one need a complex hashing procedure that  
is carefully tuned to the particular kind of data one is dealing with?  
It turns out that one does not. And in fact, all that is really  
necessary is that the hashing procedure generate enough randomness that  
even though there may be regularities in the original data, the hash  
codes that are produced still end up being distributed roughly  
uniformly across all possibilities.
</snip>

Looks like there is no "rough uniformity" here. It is more like a form  
of a Step function.

Here is a slightly modified version of the program from the post below  
for three slightly different runs:

In[47]:=
MapIndexed[{First[#2],
  Hash[#1]} & ,
  ToExpression /@
  (StringJoin["aaaa", "$",
  ToString[#1]] & ) /@
  Sort[Table[Random[
  Integer, {8, 9}],
  {i, 1, 20}]]]
Out[47]=
{{1, 1584071779},
  {2, 1584071779},
  {3, 1584071779},
  {4, 1584071779},
  {5, 1584071779},
  {6, 1584071779},
  {7, 1584071779},
  {8, 1584071779},
  {9, 1584071779},
  {10, 1584071779},
  {11, 1584071779},
  {12, 1584071779},
  {13, 1584071779},
  {14, 1584071779},
  {15, 1584071779},
  {16, 1584071780},
  {17, 1584071780},
  {18, 1584071780},
  {19, 1584071780},
  {20, 1584071780}}

In[52]:=
MapIndexed[{First[#2],
  Hash[#1]} & ,
  ToExpression /@
  (StringJoin["aaaa", "$",
  ToString[#1]] & ) /@
  Sort[Table[Random[
  Integer, {9, 10}],
  {i, 1, 20}]]]
Out[52]=
{{1, 1584071780},
  {2, 1584071780},
  {3, 1584071780},
  {4, 1584071780},
  {5, 1584071780},
  {6, 1584071780},
  {7, 1584071780},
  {8, 1584071780},
  {9, 1584071780},
  {10, 1584071780},
  {11, 1584071780},
  {12, 1584071780},
  {13, 1149946229},
  {14, 1149946229},
  {15, 1149946229},
  {16, 1149946229},
  {17, 1149946229},
  {18, 1149946229},
  {19, 1149946229},
  {20, 1149946229}}


In[57]:=
MapIndexed[{First[#2],
  Hash[#1]} & ,
  ToExpression /@
  (StringJoin["aaaa", "$",
  ToString[#1]] & ) /@
  Sort[Table[Random[
  Integer, {10, 11}],
  {i, 1, 20}]]]
Out[57]=
{{1, 1149946229},
  {2, 1149946229},
  {3, 1149946229},
  {4, 1149946229},
  {5, 1149946229},
  {6, 1149946229},
  {7, 1149946229},
  {8, 1149946230},
  {9, 1149946230},
  {10, 1149946230},
  {11, 1149946230},
  {12, 1149946230},
  {13, 1149946230},
  {14, 1149946230},
  {15, 1149946230},
  {16, 1149946230},
  {17, 1149946230},
  {18, 1149946230},
  {19, 1149946230},
  {20, 1149946230}}


Please note the "dive" for the second run.  The problem is probably in  
the hashing algorithm. The Book says: "String patterns are implemented  
using a symbolic extension of the PCRE regular expression library. "

Someone with a fast machine and time at hand might want to do an  
exhausting search for { {11,12},{12,13},....} to see if similar "dive"  
is there or not. There is none up to {8,9} for positive pairs.

Interestingly there is no Help on Hash.

János
On Jan 5, 2005, at 1:21 AM, Andrzej Kozlowski wrote:



> OK., I should have written "dive". But I would expect that the people   
> who wrote to point out this important difference should also explain   
> why it is so important. As for me I only wanted to note that the  
> rather  dramatic change in hash values that corresponds with a  
> dramatic  slowdown. This suggests that indeed the problem is related  
> to the  hashing of names of the form name$number.
> On the other hand if you look at any sequence of hash values of names   
> constructed by starting with some name and appending $number, where   
> number changes through successive values, you will find that you get   
> sequences of successive positive integers with periodic "dives" or   
> "jumps". However, in no other case I have noticed any slowdown. If the  
>  slowdown is caused by a "collision" of hash values, as seems likely,   
> there should be other cases too, but they would be extremely rare and  
> I  do not know of any way to find them except by relying on chance  
> (very  unlikely to produce any result).
>
> Andrzej
>
> On 5 Jan 2005, at 02:42, DrBob wrote:
>
>
>>>> You can see a big jump in the hash value at weirdness$10.
>>>>
>> Actually, the hash value gets much SMALLER at that point.
>>
>> Bobby
>>
>
>> On Tue, 4 Jan 2005 03:13:00 -0500 (EST), Andrzej Kozlowski   
>> <akoz at mimuw.edu.pl> wrote:
>>
>>
>>> I just noticed another thing, which seems to throw more light at what
>>> is happenng. Look carefully at the Hash values of the different names
>>> of the form weirdness$number:
>>>
>>>
>>> MapIndexed[{First[#2], Hash[#1]} & ,
>>>  ToExpression /@ (StringJoin["weirdness", "$",
>>>  ToString[#1]] & ) /@ Range[20]]
>>>
>>>
>>> {{1, 1783099877}, {2, 1783099878}, {3, 1783099879},
>>>  {4, 1783099880}, {5, 1783099881}, {6, 1783099882},
>>>  {7, 1783099883}, {8, 1783099884}, {9, 1783099885},
>>>  {10, 667842086}, {11, 667842087}, {12, 667842088},
>>>  {13, 667842089}, {14, 667842090}, {15, 667842091},
>>>  {16, 667842092}, {17, 667842093}, {18, 667842094},
>>>  {19, 667842095}, {20, 667843323}}
>>>
>>> You can see a big jump in the hash value at weirdness$10. That is
>>> exactly the point at which the slowdown occurs.
>>>
>>> Andrzej Kozlowski
>>>
>>>
>>> On 3 Jan 2005, at 21:55, Andrzej Kozlowski wrote:
>>>
>>>
>>>> I can't see any siginificant slowdowns demonstrated by your code.  
>>>> The
>>>> differences in timings between different combinations seem
>>>> insignificant. Cna you suggest any other name but "weirdness" that
>>>> shows significant slowdown?
>>>>
>>>> Besides, using Module only obsures what is really happening. Module
>>>> simply renames local variables by appending $somenumber to their  
>>>> name
>>>> but the "number" that is used is different each time. You can do  
>>>> that
>>>> whiteout Module and see exactly which names produce the slowdown
>>>> effect. It seems that the names weirdness$1 to weirdness$9 do not.
>>>>
>>>> In[1]:=
>>>> L=weirdness$1[];
>>>>  Do[L=weirdness$1[L,i],{i,10^4}]//Timing
>>>>
>>>> Out[2]=
>>>> {0.03 Second,Null}
>>>>
>>>> In[3]:=
>>>> L=weirdness$9[];
>>>>  Do[L=weirdness$9[L,i],{i,10^4}]//Timing
>>>>
>>>> Out[4]=
>>>> {0.08 Second,Null}
>>>>
>>>> The problem seems to begin with weirdness$10
>>>>
>>>> In[5]:=
>>>> L=weirdness$10[];
>>>>  Do[L=weirdness$10[L,i],{i,10^4}]//Timing
>>>>
>>>> Out[6]=
>>>> {13.2 Second,Null}
>>>>
>>>> and continues for higher values, as far as I have checked.
>>>>
>>>> When you run Module with a fresh Kernel Mathematica seems to always
>>>> begin by appending $17:
>>>>
>>>> In[1]:=
>>>> Module[{weirdness, L},
>>>>  L = weirdness[];
>>>>  Do[L = weirdness[L, i], {i, 1}]
>>>> ] // Trace
>>>>
>>>> Out[1]=
>>>> {Module[{weirdness,L},L=weirdness[];  
>>>> Do[L=weirdness[L,i],{i,1}]],{L$17=\
>>>> weirdness$17[];Do[L$17=weirdness$17[L$17,
>>>>
>>>> i],{i,1}],{L$17=weirdness$17[],weirdness$17[]},{Do[L$17=weirdness 
>>>> $17[ L$
>>>> \
>>>> 17,i],{i,1}],{{{L$17,weirdness$17[]},{i,1},weirdness$17[weirdness 
>>>> $17[ ],
>>>>
>>>> 1]},L$17=weirdness$17[weirdness$17[],1],weirdness$17[weirdness$17 
>>>> [],1 ]\
>>>> },Null},Null},Null}
>>>>
>>>> on subsequent runs this is increased:
>>>>
>>>> In[2]:=
>>>> Module[{weirdness, L},
>>>>  L = weirdness[];
>>>>  Do[L = weirdness[L, i], {i, 1}]
>>>> ] // Trace
>>>>
>>>> Out[2]=
>>>> {Module[{weirdness,L},L=weirdness[];  
>>>> Do[L=weirdness[L,i],{i,1}]],{L$19=\
>>>> weirdness$19[];Do[L$19=weirdness$19[L$19,
>>>>
>>>> i],{i,1}],{L$19=weirdness$19[],weirdness$19[]},{Do[L$19=weirdness 
>>>> $19[ L$
>>>> \
>>>> 19,i],{i,1}],{{{L$19,weirdness$19[]},{i,1},weirdness$19[weirdness 
>>>> $19[ ],
>>>>
>>>> 1]},L$19=weirdness$19[weirdness$19[],1],weirdness$19[weirdness$19 
>>>> [],1 ]\
>>>> },Null},Null},Null}
>>>>
>>>> in any case all these numbers are larger than 10 and produce   
>>>> slowdown,
>>>> but the numbers 10-16 which never occur inside Module also suffer   
>>>> form
>>>> this problem:
>>>>
>>>> In[3]:=
>>>> L=weirdness$11[];
>>>>  Do[L=weirdness$11[L,i],{i,10^4}]//Timing
>>>>
>>>> Out[4]=
>>>> {13.1 Second,Null}
>>>>
>>>> In[5]:=
>>>> L=weirdness$16[];
>>>>  Do[L=weirdness$16[L,i],{i,10^4}]//Timing
>>>>
>>>> Out[6]=
>>>> {14.62 Second,Null}
>>>>
>>>> However, I have not been able to find any other name but weirdness
>>>> that produces this effect. This seems weird.
>>>>
>>>> Andrzej Kozlowski
>>>>
>>>>
>>>>
>>>> On 3 Jan 2005, at 18:29, yehuda ben-shimol wrote:
>>>>
>>>>
>>>>> Hi Roland,
>>>>> I'm afraid this is not true. Just try to run the code I sent with  
>>>>> my
>>>>> post. "weirdness" was time consuming while weirdness1 was not. In
>>>>> addition it happened (not consistently) that function name of a   
>>>>> SINGLE
>>>>> character suffered from this behavior as well.
>>>>> the code is given below for your convenience.
>>>>>
>>>>> t = CharacterRange["a", "z"];
>>>>> Do[fname = StringJoin[t[[Table[Random[Integer, {1,26}], {i}]]]];
>>>>> Print[i, "\t", fname, "\t",
>>>>>  Timing[
>>>>>  Module[{fs = ToExpression[fname], L},
>>>>>  L = fs[];
>>>>>  Do[L = fs[L, j], {j, 104}]]]], {i, 1, 50}, {5}]
>>>>>
>>>>>
>>>>> yehuda
>>>>>
>>>>> Roland Franzius wrote:
>>>>>
>>>>>
>>>>>> Maxim wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Consider:
>>>>>>>
>>>>>>> In[1]:=
>>>>>>> Module[{f, L},
>>>>>>>  L = f[];
>>>>>>>  Do[L = f[L, i], {i, 10^4}]
>>>>>>> ] // Timing
>>>>>>>
>>>>>>> Module[{weirdness, L},
>>>>>>>  L = weirdness[];
>>>>>>>  Do[L = weirdness[L, i], {i, 10^4}]
>>>>>>> ] // Timing
>>>>>>>
>>>>>>> Out[1]=
>>>>>>> {0.015*Second, Null}
>>>>>>>
>>>>>>> Out[2]=
>>>>>>> {3.063*Second, Null}
>>>>>>>
>>>>>>> Here the timings differ by a factor of 200. Besides, the timing
>>>>>>> grows
>>>>>>> linearly in the first case and quadratically in the second
>>>>>>> (therefore, for
>>>>>>> n=10^5 there will be an approximately 2000 times slowdown). We  
>>>>>>> can
>>>>>>> only
>>>>>>> guess that something goes wrong with the symbol name hashing.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> The timing difference occurs when the symbol "wierdness" exceeds 8
>>>>>> characters. Test it for "wierdnes". That seems to be a  
>>>>>> consequence  of
>>>>>> the machine routine for string comparison. Up to 8 characters can  
>>>>>>  be
>>>>>> used without using a memory to memory compare. Of course it  
>>>>>> should  be
>>>>>> possible to write a compare routine that makes not such a bit  
>>>>>> step.
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> -- DrBob at bigfoot.com
>> www.eclecticdreams.net
>>
>>
-------------------------------------------------
clear perl code is better than unclear awk code; but NOTHING comes  
close to unclear perl code
http://www.faqs.org/faqs/computer-lang/awk/faq/
References:
- Re: Slowdown
  - From: Roland Franzius <roland.franzius@uos.de>
- Re: Re: Slowdown
  - From: yehuda ben-shimol <benshimo@bgu.ac.il>
- Re: Re: Re: Slowdown
  - From: Andrzej Kozlowski <akoz@mimuw.edu.pl>
- Re: Re: Re: Re: Slowdown
  - From: Andrzej Kozlowski <akoz@mimuw.edu.pl>
Prev by Date: Re: Re: Re: Re: Re: Slowdown
Next by Date: FullSimplify with Assumptions
Previous by thread: Re: Re: Re: Re: Re: Slowdown
Next by thread: Re: Re: Re: Re: Slowdown