Re: XML data structure parsing in Mathematica 6 using patterns
- To: mathgroup at smc.vnet.net
- Subject: [mg81559] Re: [mg81502] XML data structure parsing in Mathematica 6 using patterns
- From: DrMajorBob <drmajorbob at bigfoot.com>
- Date: Wed, 26 Sep 2007 21:53:51 -0400 (EDT)
- References: <21153656.1190833488110.JavaMail.root@m35>
- Reply-to: drmajorbob at bigfoot.com
I can't WAIT to hear someone explain in what way the documentation on this is totally clear! Until then, I can only observe that Help gives no examples in which (as in your attempts) PatternSequence is the left-hand side of a rule, or a pattern (in itself) to be matched. There are only examples where PatternSequence is inclosed in List or f. (Those are the only supported Heads, if I'm judging from Help alone, but I suppose f means "any head".) Generalizing to your problem, however... try this: p = {a___, PatternSequence[XMLElement["start-valid-time", _, {startT_}], XMLElement["end-valid-time", _, {endT_}]], b___} :> {a, {startT, endT}, b}; timeBlock //. p XMLElement["time-layout", {"time-coordinate" -> "local", "summarization" -> "none"}, {XMLElement[ "layout-key", {}, {"k-p12h-n14-1"}], {"2007-09-21T20:00:00-04:00", "2007-09-22T08:00:00-04:00"}, {"2007-09-22T08:00:00-04:00", "2007-09-22T20:00:00-04:00"}, {"2007-09-22T20:00:00-04:00", "2007-09-23T08:00:00-04:00"}, {"2007-09-23T08:00:00-04:00", "2007-09-23T20:00:00-04:00"}, {"2007-09-23T20:00:00-04:00", "2007-09-24T08:00:00-04:00"}, {"2007-09-24T08:00:00-04:00", "2007-09-24T20:00:00-04:00"}}] You wanted ONLY the time-pairs, which may require something like p = {___, a : ({_String, _String} ...), PatternSequence[XMLElement["start-valid-time", _, {startT_}], XMLElement["end-valid-time", _, {endT_}]], b___} :> {a, {startT, endT}, b}; timeBlock[[3]] //. p {{"2007-09-21T20:00:00-04:00", "2007-09-22T08:00:00-04:00"}, {"2007-09-22T08:00:00-04:00", "2007-09-22T20:00:00-04:00"}, {"2007-09-22T20:00:00-04:00", "2007-09-23T08:00:00-04:00"}, {"2007-09-23T08:00:00-04:00", "2007-09-23T20:00:00-04:00"}, {"2007-09-23T20:00:00-04:00", "2007-09-24T08:00:00-04:00"}, {"2007-09-24T08:00:00-04:00", "2007-09-24T20:00:00-04:00"}} All this seems to mean that PatternSequence[p1,p2] is not a pattern, since you can't match anything with it. But anyhead[PatternSequence[p1,p2]] IS a pattern. Bobby On Wed, 26 Sep 2007 05:38:18 -0500, Daniel Flatin <dflatin at rcn.com> wrote: > This is my third attempt at posting this message. Apologies if somehow > the first two got through. I saw no sign of them, however. > > I have an XML data structure where I want to extract the start and end > times, for example: > > timeBlock XMLElement["time-layout", {"time-coordinate" -> "local", > "summarization" -> "none"}, {XMLElement[ > "layout-key", {}, {"k-p12h-n14-1"}], > XMLElement["start-valid-time", {}, {"2007-09-21T20:00:00-04:00"}], > XMLElement["end-valid-time", {}, {"2007-09-22T08:00:00-04:00"}], > XMLElement["start-valid-time", {}, {"2007-09-22T08:00:00-04:00"}], > XMLElement["end-valid-time", {}, {"2007-09-22T20:00:00-04:00"}], > XMLElement["start-valid-time", {}, {"2007-09-22T20:00:00-04:00"}], > XMLElement["end-valid-time", {}, {"2007-09-23T08:00:00-04:00"}], > XMLElement["start-valid-time", {}, {"2007-09-23T08:00:00-04:00"}], > XMLElement["end-valid-time", {}, {"2007-09-23T20:00:00-04:00"}], > XMLElement["start-valid-time", {}, {"2007-09-23T20:00:00-04:00"}], > XMLElement["end-valid-time", {}, {"2007-09-24T08:00:00-04:00"}], > XMLElement["start-valid-time", {}, {"2007-09-24T08:00:00-04:00"}], > XMLElement["end-valid-time", {}, {"2007-09-24T20:00:00-04:00"}]}] > I can do this by finding all start times and all end times and then > combining them, as in > > getTimeSequence[timeBlock_] := > Module[{startBlock, endBlock, startT, endT}, > startBlock = Cases[timeBlock, > XMLElement["start-valid-time", _, {startT_}] :> startT, > Infinity]; > endBlock = Cases[timeBlock, XMLElement["end-valid-time", _, {endT_}] > :> endT, > Infinity]; > Transpose[{startBlock, endBlock}] > ] > Philosophically, I think I should be able to capture the start and end > times with a single pattern, but I can't make it work. For example I > have tried: > > getTimeStartStopPairs[timeBlock_] := > Module[{startBlock, endBlock, startT, endT}, > Cases[ > timeBlock, > PatternSequence[XMLElement["start-valid-time", _, {startT_}], > XMLElement["end-valid-time", _, {endT_}]] :> {startT, endT}, > Infinity > ] > ] > > Does anyone have any suggestions? I would like to learn how to do this > with just one pattern and I feel like I am misinterpreting how > PatternSequence works. > > Thanks, > Dan > > -- DrMajorBob at bigfoot.com