Re: RuleDelayed for parsing XML with multiple children
- To: mathgroup at smc.vnet.net
- Subject: [mg109637] Re: RuleDelayed for parsing XML with multiple children
- From: Leonid Shifrin <lshifr at gmail.com>
- Date: Sun, 9 May 2010 07:50:33 -0400 (EDT)
Hi Zach, As an alternative to Albert's solution, you can use nested rules in conjunction with Reap-Sow. Here is your XML: In[1]:= xmlTree = ImportString[ "<LevelA><LevelB><Child>value1</Child></LevelB><LevelB><Child>\ value2</Child></LevelB></LevelA>", "XML"] Out[1]= XMLObject["Document"][{}, XMLElement[ "LevelA", {}, {XMLElement[ "LevelB", {}, {XMLElement["Child", {}, {"value1"}]}], XMLElement[ "LevelB", {}, {XMLElement["Child", {}, {"value2"}]}]}], {}] This is what I suggest: In[2]:= Flatten@ Reap[xmlTree /. XMLElement["LevelA", _, childrenA_] :> ( childrenA /. XMLElement["LevelB", _, childrenB_] :> (childrenB /. XMLElement["Child", _, {child_}] :> Sow[child]))][[2]] Out[2]= {"value1", "value2"} This can be automated, for example, as follows In[3]:= Clear[getChildrenRule]; getChildrenRule[levelNames : {__String}] := Fold[ With[{ch = Unique[]}, Hold[XMLElement[#2, _, Pattern[ch, Blank[]]], ch /. #1]] &, x_ :> Sow[x], Reverse[levelNames]] /. Hold -> RuleDelayed; In[5]:= Clear[getChildren]; getChildren[levelNames : {__String}, parsedXML_] := Flatten@ Reap[parsedXML /. getChildrenRule[levelNames]][[2]]; In[7]:= getChildren[{"LevelA", "LevelB", "Child"}, xmlTree] Out[7]= {"value1", "value2"} This will generalize to any number of levels. The last level is that from which you need children. The code may be a little obscure though since some scope engineering/surgery was needed to implement a nested rule. Basically, I used Unique to programmatically create a pattern like $1_, and Hold to be replaced by RuleDelayed only at the final stage to prevent RuleDelayed from trying to prematurely "protect" the <ch> variable from With injecting Unique[] in its place. In case you find the above code needlessly complex, here is another solution based on a combination of a local recursive function and a rule-based approach: In[8] = Clear[getChildrenAlt]; getChildrenAlt[levelNames : {__String}, parsedXML_] := Module[{f}, f[x_, n_] := x /. XMLElement[levelNames[[n]], _, c_] :> f[c, n + 1]; f[x_, Length[levelNames] + 1] := Sow[x]; Flatten[Reap[f[parsedXML, 1]][[2]]]] This one I find easier to understand. It produces the same result of course: In[10]:= getChildrenAlt[{"LevelA", "LevelB", "Child"}, xmlTree] Out[10]= {"value1", "value2"} Both solutions are pretty much equivalent to Albert's one with nested Cases, so it is probably a matter of taste which one to use. More generally, a combination of rules and recursion seems to be one of the most powerful techniques available in Mathematica (of course, you can also use ReplaceRepeated and reduce everything to local rules). Hope this helps. Regards, Leonid On Fri, May 7, 2010 at 3:29 AM, Zach Bjornson <bjornson at mit.edu> wrote: > Hi, > > I'm trying to extract multiple values (same depth, different physical > level) from an XML tree using RuleDelayed, but I realized that using > RuleDelayed only extracts one of the children. That is: > > XmlTree follows the form: > <LevelA> > <LevelB> > <Child>value1</Child> > </LevelB> > <LevelB> > <Child>value2</Child> > </LevelB> > </LevelA> > > I want those two values. > > Cases[XmlTree, > > XMLElement["LevelA",_,{___,XMLElement["LevelB",_,{___,XMLElement["Child",_,{WantThisValue_}],___}],___}]:>{WantThisValue},Infinity] > > This only gives value1. I can explicitly add more XMLElement tags to get > the other value > > Cases[XmlTree, > > XMLElement["LevelA",_,{___,XMLElement["LevelB",_,{___,XMLElement["Child",_,{WantThisValue_}],___}],___,XMLElement["LevelB",_,{___,XMLElement["Child",_,{WantThisValueToo_}],___}]:>{WantThisValue,WantThisValueToo},Infinity] > > but I don't want to use an explicit structure because the number of > children/values varies. > > Any have suggestions for the best alternative method to the RuleDelayed > syntax I'm using? Changing the innermost XMLElement tag to simply > XMLElement["Child",_,_] gives the proper answer, albeit with the messy > flanking tree structure. Dropping indices is not an ideal solution. > > Thanks! > Zach > >