MathGroup Archive 2009

[Date Index] [Thread Index] [Author Index]

Search the Archive

Import [ #, Data ]&

  • To: mathgroup at smc.vnet.net
  • Subject: [mg97622] Import [ #, Data ]&
  • From: Fred Klingener <gigabitbucket at BrockEng.com>
  • Date: Tue, 17 Mar 2009 04:59:29 -0500 (EST)

Group,

Over the last few days, I've spent enough time trying to manipulate
and plot web data using brute force techniques to appreciate the
immense power (and a few really nasty gotchas) of the way Mathematica
handles Import[]. It seemed to be a good idea to share some of the
ideas and maybe to reap recommendations about how it really should be
done.

The immediate task was to follow the evolving Iditarod Trail Sled Dog
Race and plot the position vs. time chart for the leaders.

During the race, the organizer posts regular updates of the standings,
the core of which are the web pages, one for each musher, that contain
data tables of arrival and departure times at each checkpoint.

It's these tables that I wanted to take apart and reassemble in a way
that would let me plot the comparative progress.

As a starting point, the page listing the top five mushers contains
hyperlinks to the pages for each. This list is available from:

top5=Import["http://www.iditarod.com/race/race/
topfive.html","Hyperlinks"]

There are a lot here, but ones I want are distinguished by the form
that included "musher_" followed by a decimal number. So these could
be picked out by

top5=Select[
top5
,{}!=StringPosition[#,"musher_"]&
]

Here, as one of the nastiest little surprises, the musher files are
not returned in running order even though they appear in order in the
source. So I had to prospect each file to find where the current
position was located (I found it at [[4, 3]]), get the order, and use
that to sort the top5 list. Here's the result:

top5data=Import[#,"Data"]&/@top5;
sorted=Ordering[top5data[[All,4,3]]];
top5=top5[[sorted]];

So this sorted list of the top five musher pages could be used to
retrieve all the latest checkpoint/time data, and here's the crux
power in the process. Import[#, "Data"] dissects the target page,
evidently recognizes tables, and assembles the results into what it
thinks are useful Mathematica structures. In particular, I found the
checkpoint/arrival time/departure time [[5, 3]] block for each return.

data=(Import[#,"Data"]&/@top5)[[All,5,3]];

The checkpoint list might be different for each musher.

trackLength=Length[data[[#]]]&/@Range[Length[data]];

The idea from here was to construct lists of musher positions (in
checkpoint miles from the start) vs. time for plotting. DateListPlot[]
would have been handy here, but I couldn't get many of its advertised
options to work. So I hacked ListPlot[] to do it.

Mathematica will complain, but it will convert the time/date stamps in
the Iditarod data files into AbsoluteTime[], so it remains to populate
the plotting array. The process was complicated by the inconsistency
in the way in and out times were recorded among the checkpoints, so I
eventually settled on the following Monument to Incorrectness:

pos={{Quiet@AbsoluteTime[#[[1,3]]],ToExpression[#[[1,2]]]}}&/@data;
For[
musher=1
,musher<=Length[data]
,musher++
,Quiet@
For[
checkPoint=1
,checkPoint<=trackLength[[musher]]
,checkPoint++
,AppendTo[pos[[musher]],{AbsoluteTime[data[[musher,checkPoint,
3]]],ToExpression[data[[musher,checkPoint,2]]]}] ;
If[Length[data[[musher,checkPoint]]]>4
,AppendTo[pos[[musher]],{AbsoluteTime[data[[musher,checkPoint,
5]]],ToExpression[data[[musher,checkPoint,2]]]}]];
]
];

As embarrassing as it is to include that bit of code, on reflection it
seems to be eminently readable and maintainable. Maybe I don't feel so
bad. After I post this, I'll get to work on the encrypted one-liner.
>From here, a simple

ListPlot[pos,Joined->True]

gets me a recognizable form. Then, I spent all the time saved by Import
[#, "Data"] on prettification:

chkPtNames=StringSplit[#," ("][[1]]&/@data[[1,All,1]];
chkPtMiles=data[[1,All,2]];
yTicksTable=Table[{chkPtMiles[[j]],chkPtNames[[j]]<>" "<>ToString
[chkPtMiles[[j]]]},{j,1,Length[data[[1]]]}];
xGridPositions=Table[AbsoluteTime[{2009,03,j}],{j,6,17}];
xLabelPositions=Table[AbsoluteTime[{2009,03,j,12}],{j,6,17}];
xLabels=DateString[#,{"DayNameShort","\n","Day"}]&/@xLabelPositions;
xTicksTable=Table[{xLabelPositions[[j]],xLabels[[j]]},{j,1,Length
[xLabelPositions]}];

ListPlot[
pos
,PlotRange->{AbsoluteTime[{2009,03,#}]&/@{7,17},{0,1100}}
,Joined->True
,Ticks->{xTicksTable,yTicksTable}
,GridLines->{xGridPositions,chkPtMiles}
,ImageSize->{640,480}
,AxesOrigin->{AbsoluteTime[{2009,03,7}],0}
,BaseStyle->"Label"]

There's some prettification to be done. Like a legend so I could tell
who is who, but Mackey is going to win it no matter what.

Cheers,
Fred Klingener


  • Prev by Date: Re: Using Mathematica notebooks in presentations?
  • Next by Date: Re: BarChart - Extendng the Y-axis and labeling the endpoints
  • Previous by thread: Re: Problem with mathlink:PLEASE HELP ME
  • Next by thread: Re: Import [ #, Data ]&