MathGroup Archive: August 2007 [00089]

[Date Index] [Thread Index] [Author Index]

Re: Unbearably slow plotting (v6)David Bailey,http://www.dbaileyconsultancy.co.uk

To: mathgroup at smc.vnet.net
Subject: [mg79693] Re: Unbearably slow plotting (v6)David Bailey,http://www.dbaileyconsultancy.co.uk
From: Joseph Gwinn <joegwinn at comcast.net>
Date: Thu, 2 Aug 2007 03:45:06 -0400 (EDT)
References: <f8pii2$1s1$1@smc.vnet.net>

In article <f8pii2$1s1$1 at smc.vnet.net>,
 Bill Rowe <readnewsciv at sbcglobal.net> wrote:

> On 7/30/07 at 6:39 AM, dave at Remove_Thisdbailey.co.uk (David Bailey)
> wrote:
> 
> >Bill Rowe wrote:
> 
> 
> >>With 1E5 and 1E6 points, this results in a plot that is
> >>indistinguishable from a filled rectangle. That seems to be of very
> >>little use. So, while I might be a bit impatient waiting for
> >>Mathematica on my machine to plot 1E6 points, I don't see why I
> >>would want to do that in the first place. What I want from ListPlot
> >>is something to give me an idea of trends in my data. Given real
> >>limits on display resolution and size, plotting 1E6 points
> >>typically will not provide a useful plot regardless of how fast it
> >>plots. So why do this
> 
> >The main reason people do this, is that they have experimental data
> >which they want to visualise without having to filter it in some way
> >to remove redundant data points.
> 
> I certainly understand the need to visualize experimental data
> since that is one of the things I do most frequently with
> Mathematica. And I also understand the desire to do this as
> easily and efficiently as possible.

I have to chime in here.  Although I don't currently have this problem, 
I have in the past used dotplots with up to 250,000 points, which was at 
the time a huge dataset, even on a UNIX machine.  This was in the mid 
1990s.

Downsampling would have been fatal in my application.  What I was doing 
was diagnosing why Network Time Protocol was not properly synchronizing 
the clocks on an isolated pair of SunOS test machines.  A Y-harness 
driven by a DOS box delivered a 1-Hz serial stream of bytes 
simultaneously to both Sun boxes, each of which read the local clock 
upon receiving the byte.  The timestamps were recorded to a file, one 
file to a Sun box.  This was allowed to run overnight and over weekends.

The two files were than aligned and the timestamps differenced, each 
difference being plotted versus sample number.  

When all is well, the sample dots coalesce into a dense line that 
wobbles gently around zero offset, plus a very thin halo of outliers 
(caused by interrupts in the two machines).

All was not well, initially.  What we got was a series of slanted 
parallel dense lines in a one-sided chevron pattern, immersed in a halo 
of outliers points.  Huh?  Who ordered that?

There were many theories.  One was that the clocks in the Sun boxes were 
coarsely quantized, and that this caused the chevron pattern.  The 
outliers were the key.  If the clocks were quantized, the outliers 
should have fallen on (or parallel to) the chevron lines, but the 
outliers did no such thing.  And so on.  It turned out that the problem 
was that serial-line driver was internally buffering those trigger 
bytes, which took many experiments to figure out.  The solution was toi 
switch to UDP packets.

In any event, the point of the above war story is to show that 
downsampling is not a general solution.  Sometimes, one needs *all* the 
data.

> But Mathematica is not a substitute for thinking about your
> data. If you ask Mathematica to plot a million points it will do
> so in whatever time is required. But such plots are almost never
> a good way to visualize data given typical sized displays and
> their resolution.

I imagine that the original poster knows the meaning of his data.  The 
question asked was why Mathematica 6 is so much slower than Mathematica
5.2 doing the same thing.  Nor is it an unreasonable thing to ask for.
Things are supposed to get better with succeeding versions, and
Mathematica's handling of data input and output has in general done so.
But there has been backsliding.

> In fact, one of the nice things about version 6 is it makes
> simple filtering such as taking every nth point very easy. For
> example, if I had 1E6 data points, I likely would initially plot
> every 100th point or every 1000th point to get a reasonable
> plot. As I am sure you are aware doing:
> 
> ListPlot[datra[[;; ;;100]]]
> 
> in version 6 will plot every 100th point.

This would be quite useful in some kinds of signal processing.

> Yes, there is always a risk in using such simple filtering
> schemes important aspects of the data will be missed. But that
> same risk exists if all data points in such a large set are
> plotted. If there are only a few important points, plotting all
> of the points will almost certainly obscure the few important points.

Always true, but let us return to the question at hand:  Why is Mathematica 6.0 
so much slower than 5.2 at handling of data points?  

I must say that my reaction to the original question was to think to 
myself that I use this kind of dotplotting all the time, so maybe I 
ought to wait for 6.1 to come along.  Dot zero of anything always has a 
bunch of rough edges.

Joe Gwinn

Prev by Date: Re: removing those annoying $$$$'s from local variables names?

Next by Date: Re: Unbearably slow plotting (v6)David Bailey,http://www.dbaileyconsultancy.co.ukDavid

Previous by thread: Re: Unbearably slow plotting (v6)David Bailey,http://www.dbaileyconsultancy.co.uk

Next by thread: Re: Factorise with respect to a variable