MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: RuleDelayed for parsing XML with multiple children

  • To: mathgroup at smc.vnet.net
  • Subject: [mg109637] Re: RuleDelayed for parsing XML with multiple children
  • From: Leonid Shifrin <lshifr at gmail.com>
  • Date: Sun, 9 May 2010 07:50:33 -0400 (EDT)

Hi Zach,

As an alternative to Albert's solution, you can use nested rules in
conjunction with Reap-Sow. Here is your XML:

In[1]:= xmlTree =
 ImportString[
  "<LevelA><LevelB><Child>value1</Child></LevelB><LevelB><Child>\
value2</Child></LevelB></LevelA>", "XML"]

Out[1]= XMLObject["Document"][{},
 XMLElement[
  "LevelA", {}, {XMLElement[
    "LevelB", {}, {XMLElement["Child", {}, {"value1"}]}],
   XMLElement[
    "LevelB", {}, {XMLElement["Child", {}, {"value2"}]}]}], {}]

This is what I suggest:

In[2]:= Flatten@
 Reap[xmlTree /. XMLElement["LevelA", _, childrenA_] :>
     ( childrenA /. XMLElement["LevelB", _, childrenB_] :>
        (childrenB /.
          XMLElement["Child", _, {child_}] :> Sow[child]))][[2]]

Out[2]= {"value1", "value2"}

This can be automated, for example, as follows

In[3]:=
Clear[getChildrenRule];
getChildrenRule[levelNames : {__String}] :=
  Fold[
    With[{ch = Unique[]},
      Hold[XMLElement[#2, _, Pattern[ch, Blank[]]], ch /. #1]] &,
    x_ :> Sow[x], Reverse[levelNames]] /. Hold -> RuleDelayed;

In[5]:=
Clear[getChildren];
getChildren[levelNames : {__String}, parsedXML_] :=
  Flatten@
   Reap[parsedXML /. getChildrenRule[levelNames]][[2]];

In[7]:= getChildren[{"LevelA", "LevelB", "Child"}, xmlTree]

Out[7]= {"value1", "value2"}

This will generalize to any number of levels. The last level is that from
which you need children. The code may be a little obscure though since some
scope engineering/surgery was needed to implement a nested rule. Basically,
I used Unique to programmatically create a pattern like $1_, and Hold to be
replaced by RuleDelayed only at the final stage to prevent RuleDelayed from
trying to prematurely "protect" the <ch> variable from With injecting
Unique[] in its place.

In case you find the above code needlessly complex, here is another solution
based on a combination of  a local recursive function and a rule-based
approach:

In[8] =
Clear[getChildrenAlt];
getChildrenAlt[levelNames : {__String}, parsedXML_] :=
 Module[{f},
  f[x_, n_] := x /. XMLElement[levelNames[[n]], _, c_] :> f[c, n + 1];
  f[x_, Length[levelNames] + 1] := Sow[x];
  Flatten[Reap[f[parsedXML, 1]][[2]]]]

This one I find  easier to understand. It produces the same result of
course:

In[10]:= getChildrenAlt[{"LevelA", "LevelB", "Child"}, xmlTree]

Out[10]= {"value1", "value2"}

Both  solutions are pretty much equivalent to Albert's one with nested
Cases, so it is probably a matter of taste which one to use. More generally,
a combination  of rules and recursion seems to be one of the most powerful
techniques available in Mathematica (of course, you can also use
ReplaceRepeated and reduce everything to local rules).

Hope this helps.

Regards,
Leonid



On Fri, May 7, 2010 at 3:29 AM, Zach Bjornson <bjornson at mit.edu> wrote:

> Hi,
>
> I'm trying to extract multiple values (same depth, different physical
> level) from an XML tree using RuleDelayed, but I realized that using
> RuleDelayed only extracts one of the children. That is:
>
> XmlTree follows the form:
> <LevelA>
> <LevelB>
> <Child>value1</Child>
> </LevelB>
> <LevelB>
> <Child>value2</Child>
> </LevelB>
> </LevelA>
>
> I want those two values.
>
> Cases[XmlTree,
>
> XMLElement["LevelA",_,{___,XMLElement["LevelB",_,{___,XMLElement["Child",_,{WantThisValue_}],___}],___}]:>{WantThisValue},Infinity]
>
> This only gives value1. I can explicitly add more XMLElement tags to get
> the other value
>
> Cases[XmlTree,
>
> XMLElement["LevelA",_,{___,XMLElement["LevelB",_,{___,XMLElement["Child",_,{WantThisValue_}],___}],___,XMLElement["LevelB",_,{___,XMLElement["Child",_,{WantThisValueToo_}],___}]:>{WantThisValue,WantThisValueToo},Infinity]
>
> but I don't want to use an explicit structure because the number of
> children/values varies.
>
> Any have suggestions for the best alternative method to the RuleDelayed
> syntax I'm using? Changing the innermost XMLElement tag to simply
> XMLElement["Child",_,_] gives the proper answer, albeit with the messy
> flanking tree structure. Dropping indices is not an ideal solution.
>
> Thanks!
> Zach
>
>


  • Prev by Date: Re: Variables in Iterator limits?
  • Next by Date: Re: How to write reports and books in Mathematica
  • Previous by thread: Re: RuleDelayed for parsing XML with multiple children
  • Next by thread: Re: RuleDelayed for parsing XML with multiple children