Hacking Wolfram|Alpha
- To: mathgroup at smc.vnet.net
- Subject: [mg101921] Hacking Wolfram|Alpha
- From: Fred Klingener <gigabitbucket at BrockEng.com>
- Date: Thu, 23 Jul 2009 03:54:37 -0400 (EDT)
While waiting for the Wolfram|Alpha API, I've been nosing around what we have available, looking for ways that Mathematica can make use of W| A. Here's an exercise that I thought was interesting enough to share (and post to a place where I could find it again.) My goal was to draw a graph of the specific entropy of superheated steam as a function of pressure at a constant temperature. If I feed into W|A's browser query window "entropy steam 400F 60psia", I get a newsy page that includes the answer "Result: 7175 J/(kg K)". The part of W|A that will some day (I hope) respond to requests for alternate unit systems isn't hooked up yet, but getting any answer is cool. The next step is to find out whether the result can be retrieved by Mathematica. The URL of the response page in the particular example I ran was "http://www31.wolframalpha.com/input/?i=entropy+steam+400F +60psia". I'm guessing that the "31" in the "www31" is unnecessary and it might cause problems later, so I'll leave it out. The Import [url,"Elements"] gives a normal looking list of forms: In[1]:= Import["http://www.wolframalpha.com/input/?i=entropy+steam+400F +60psi", \ "Elements"] Out[1]= {"Data", "FullData", "Hyperlinks", "Images", "ImageURLs", "Plaintext", \ "Source", "Title", "XMLObject"} The construction of the W|A response page is complex, and (at least in this case) the results are presented in graphic elements that lie outside the reach of the "Data" and "FullData" Imports. The result pod is available as one of the Images, but not in a way that makes the numerical value accessible. The numbers appear in the Source, but ultimately, the most promising attack has to be picking the XMLObject apart. In[2]:= xml = Import[ "http://www.wolframalpha.com/input/?i=entropy++steam+400+degree+F +60+psia", "XMLObject"]; The xml object is a pretty daunting thing, but it does have a structure, and the structure does yield to systematic disassembly. There is a variety of XMLElements that serve different purposes - tables, scripts, and some evidently make web Mathematica calls. An inspection of the xml object suggeted that the most vulnerable spots were the XMLObjects with tags of "img." While the "src" attributes point to images computed elsewhere, an "alt" attribute and a "title" were defined for the Result pod, presumably for display if the call to the image generator failed. All of the XMLElements with "img" tags can be listed by: In[22]:= imgs = Cases[xml, XMLElement["img", _, _], Infinity]; and evidently, the one I need is the fourth. In[23]:= imgs[[4]]; which is a Mathematica expression with the following characteristics: In[29]:= {Head[#], Depth[#], Length[#]} &@% Out[29]= {XMLElement, 4, 3} I can pick out the attributes section of the fourth XMLElement, which section is a List of Rules. In[30]:= imgs[[4]] /. XMLElement[_, attr_, _] -> attr; Head[#] & /@ % Out[31]= {Rule, Rule, Rule, Rule} I can extract a String representation of value assigned to the "title" attribute with the replacement: In[8]:= "title" //. %% Out[8]= "7175 J/(kg K) (joules per kilogram kelvin)..." The appearance of the superfluous spelled-out units is a nuisance, but easy enough to fix. In[9]:= StringTake[#, -1 + StringPosition[#, "(j"][[1, 1]]] &@% Out[9]= "7175 J/(kg K) " and a variable can be assigned to the entropy that I'm after. In[10]:= s = ToExpression[%] Out[10]= (7175 J)/(K kg) Some shuffling is required to match the units in which the I expressed the input query: In[11]:= << Units` In[12]:= Quiet@Convert[s /. J -> Joule /. kg -> Kilogram /. K -> Kelvin, BTU/(Pound Rankine)] Out[12]= (1.71371 BTU)/(Pound Rankine) which matches up pretty well (somewhere between "astonishing" or "what did I expect?") with the value of 1.7135 BTU/(lb R) given by my trusty, crusty 1936 Keenan and Keyes "Thermodynamic Properties of Steam." Pretty nifty. To draw the plot, I can nest the whole mess into one inscrutable knot and poll W|A a couple of times with a range of pressures. In[17]:= << Units` ListLinePlot[ table = {#, First@ Quiet@ Convert[ ( ToExpression[StringTake[#, -1 + StringPosition[#, "(j"] [[1, 1]]] &@ ( "title" //. ( Cases[ Import["http://www.wolframalpha.com/input/?i=\ entropy++steam+400+degree+F+" <> ToString[#] <> "+psia", "XMLObject"] , XMLElement["img", _, _], Infinity][[4]] //. XMLElement["img", attr_, _] -> attr ) ) ] ) /. J -> Joule /. K -> Kelvin /. kg -> Kilogram, BTU/(Pound Rankine)] } & /@ {40, 60, 80, 100} ] ... For all that work, the plot should at least be smooth. In[19]:= line = Fit[table, {1, x, x^2}, x] Out[19]= 1.88242 - 0.00353908 x + 0.0000123901 x^2 In[20]:= Plot[ line , {x, 40, 100} , AxesOrigin -> {40, 1.64} , AxesLabel -> {"pressure (psia)" , "entropy (BTU/(lb R)" } , PlotLabel -> "Superheated Steam\nEntropy vs Pressure at 400\ [Degree] F" , BaseStyle -> "Label" ] ... A procedure like this, making a few programmed calls to W | A, evidently doesn't attract the attention of whatever anti - jamming safeguards W|A has built in. It' s slow, and maybe that' s the inherent protection. Overall, what have I learned about how I'd like an API to work? 1.) An API user shouldn't have to become expert in javascripts, web Mathematica, and XML together to use it. 2.) W|A should place computed results within reach of an Import[url, "Data"] 3.) If it's a Mathematica API, numerical results should be presented in a way that's compatible with Units`. 4.) Depending on the way the load evolves, W|A calls should be Listable with the hope of avoiding the numerous spearate calls I made in the example. Fred Klingener