Re: Unbearably slow plotting (v6)David Bailey,http://www.dbaileyconsultancy.co.uk
- To: mathgroup at smc.vnet.net
- Subject: [mg79693] Re: Unbearably slow plotting (v6)David Bailey,http://www.dbaileyconsultancy.co.uk
- From: Joseph Gwinn <joegwinn at comcast.net>
- Date: Thu, 2 Aug 2007 03:45:06 -0400 (EDT)
- References: <f8pii2$1s1$1@smc.vnet.net>
In article <f8pii2$1s1$1 at smc.vnet.net>, Bill Rowe <readnewsciv at sbcglobal.net> wrote: > On 7/30/07 at 6:39 AM, dave at Remove_Thisdbailey.co.uk (David Bailey) > wrote: > > >Bill Rowe wrote: > > > >>With 1E5 and 1E6 points, this results in a plot that is > >>indistinguishable from a filled rectangle. That seems to be of very > >>little use. So, while I might be a bit impatient waiting for > >>Mathematica on my machine to plot 1E6 points, I don't see why I > >>would want to do that in the first place. What I want from ListPlot > >>is something to give me an idea of trends in my data. Given real > >>limits on display resolution and size, plotting 1E6 points > >>typically will not provide a useful plot regardless of how fast it > >>plots. So why do this > > >The main reason people do this, is that they have experimental data > >which they want to visualise without having to filter it in some way > >to remove redundant data points. > > I certainly understand the need to visualize experimental data > since that is one of the things I do most frequently with > Mathematica. And I also understand the desire to do this as > easily and efficiently as possible. I have to chime in here. Although I don't currently have this problem, I have in the past used dotplots with up to 250,000 points, which was at the time a huge dataset, even on a UNIX machine. This was in the mid 1990s. Downsampling would have been fatal in my application. What I was doing was diagnosing why Network Time Protocol was not properly synchronizing the clocks on an isolated pair of SunOS test machines. A Y-harness driven by a DOS box delivered a 1-Hz serial stream of bytes simultaneously to both Sun boxes, each of which read the local clock upon receiving the byte. The timestamps were recorded to a file, one file to a Sun box. This was allowed to run overnight and over weekends. The two files were than aligned and the timestamps differenced, each difference being plotted versus sample number. When all is well, the sample dots coalesce into a dense line that wobbles gently around zero offset, plus a very thin halo of outliers (caused by interrupts in the two machines). All was not well, initially. What we got was a series of slanted parallel dense lines in a one-sided chevron pattern, immersed in a halo of outliers points. Huh? Who ordered that? There were many theories. One was that the clocks in the Sun boxes were coarsely quantized, and that this caused the chevron pattern. The outliers were the key. If the clocks were quantized, the outliers should have fallen on (or parallel to) the chevron lines, but the outliers did no such thing. And so on. It turned out that the problem was that serial-line driver was internally buffering those trigger bytes, which took many experiments to figure out. The solution was toi switch to UDP packets. In any event, the point of the above war story is to show that downsampling is not a general solution. Sometimes, one needs *all* the data. > But Mathematica is not a substitute for thinking about your > data. If you ask Mathematica to plot a million points it will do > so in whatever time is required. But such plots are almost never > a good way to visualize data given typical sized displays and > their resolution. I imagine that the original poster knows the meaning of his data. The question asked was why Mathematica 6 is so much slower than Mathematica 5.2 doing the same thing. Nor is it an unreasonable thing to ask for. Things are supposed to get better with succeeding versions, and Mathematica's handling of data input and output has in general done so. But there has been backsliding. > In fact, one of the nice things about version 6 is it makes > simple filtering such as taking every nth point very easy. For > example, if I had 1E6 data points, I likely would initially plot > every 100th point or every 1000th point to get a reasonable > plot. As I am sure you are aware doing: > > ListPlot[datra[[;; ;;100]]] > > in version 6 will plot every 100th point. This would be quite useful in some kinds of signal processing. > Yes, there is always a risk in using such simple filtering > schemes important aspects of the data will be missed. But that > same risk exists if all data points in such a large set are > plotted. If there are only a few important points, plotting all > of the points will almost certainly obscure the few important points. Always true, but let us return to the question at hand: Why is Mathematica 6.0 so much slower than 5.2 at handling of data points? I must say that my reaction to the original question was to think to myself that I use this kind of dotplotting all the time, so maybe I ought to wait for 6.1 to come along. Dot zero of anything always has a bunch of rough edges. Joe Gwinn