Re: data structure equivalent ops
- To: mathgroup at yoda.ncsa.uiuc.edu
- Subject: Re: data structure equivalent ops
- From: CAMERON at midd.cc.middlebury.edu
- Date: Fri, 22 Feb 91 17:13 EST
David Doll says: > Hello, quick question: I was wondering about how you would do data > structure operations like insert-sorted, delete, find, etc., > double(single) link list, a priority queue, binary tree ops - > insert, delete, traversal, etc., stack ops - pushd, pop, etc. I'm > about to give a quick talk to a few people in my research lab about > the thought of using Mathematica and one of them mentioned it > would helpfull to explain the difference between doing some of the > above mentioned in "C" and in Mathematica. Well, it's not a "quick question" once I've gotten my hands on it! :-) This is a long posting, but it answers the question fairly thoroughly. I was one of the presenters at the 2nd annual Mma Users' Conference in San Francisco last January. One of my presentations was a 90-minute talk called "Introduction to Mma for Users of FORTRAN and C", but I didn't actually rely on any knowledge of those languages, and the title "Introduction to Mma for Users Traditional Procedural Languages" would have been equally apt. The use of pointers in Mma to build up classical data structures like the ones you mention is one of the topics I covered. The example I used in the notes for that talk is insertion sort with a binary tree. Here it is, in revised form. ----- The one-line answer: store pointers in functions. First of all, you must shed (if you've acquired it) the notion that Mma "functions" are "programs". In classical procedural languages there is a sharp distinction between executable entities in a program (functions and procedures and subroutines) and data objects (variables, arrays, pointers). There is no such division in Mma -- programs are data. If you insist on thinking of a statement like f[x_] := x^2 + x only as a "program" that accepts "input" x and computes the "output" x^2 + x then you may miss the insight that you can store data in functions. This statement actually stores a rewriting rule (which is a data object) in the global rule base, namely the rule "f[x_] :> x^2 + 2". You can think of this as specifying the "value" of the "pointer named f" at any "node named x". Similarly, the statements left[a] = b; right[a] = c; can be thought of as specifying that the value of the "left" pointer at "node a" is "b", and the value of the "right" pointer is "c"; they store the rules "left[a] -> b" and "right[a] -> c" in the global rule base. If you have many other rewriting rules on "left" and "right", such as "left[x] -> y" and "right[u] -> v", this amounts to a data structure built out of the nodes "{a,b,c,...,u,v,...,x,y,...}" and the pointers "left" and "right". This could be a binary tree (for example). There is only one difference between this and the usual way of constructing a data structure from pointers in languages like C and Pascal, and it's this: in C you'd declare a "struct" type (in Pascal it'd be a "record") that contained two "fields" named "left" and "right", which were pointers to this same type of "struct". There would be an instance of this "struct" for each node. Thus the information about "left" and "right" subtrees of a node would be physically stored in the data structure that represents the node. In Mma, the information about ALL "left" subtrees is physically stored in memory under the "pointer name" "left", instead of being distributed among the nodes. Similarly the information about all "right" subtrees is stored with the name "right". But who cares whether the value of "right[a]" is stored with "right" or with "a"? For most purposes, it doesn't matter. You can tell whether a pointer is defined or not by whether it rewrites. For example, suppose your "nodes" are Mma lists that store several items of information. Suppose you have a binary tree of these nodes, and the pointers are "left" and "right". Suppose the variable "x" in a procedure represents one of the nodes in the tree, and you want to know whether "x" has a left subtree or not. In C or Pascal, you would ask whether "x.left" is a null pointer. In Mma, you would simply ask whether "left[x]" has head "left"! After all, if "x" has a left subtree, then "left[x]" will evaluate to the node that is the root of that subtree, and (since we are storing nodes as lists) the expression "Head[left[x]]" will be "List". But if "x" has no left subtree, then "left[x]" won't have a rewriting rule associated with it, so it won't rewrite, so "Head[left[x]]" will be "left". This also shows how to clear a pointer: just say "left[x] = ." (or, equivalently, "Unset[left[x]]", which is what that is an abbreviation for). With all that in mind, here's an insertion sort in Mma. In this case we're building a binary tree whose nodes are strings. ClearAll[ root, left, right ] (* these are our "pointers" *) insert[s_String] := insert[root,s] insert[ x_String, y_ ] := Which[ x === y, x, OrderedQ[{x,y}], insert[ right[x], y ], True, insert[ left[x], y ] ] insert[ x_, y_ ] := x = y That's it! The rules, in order, say this: First, initialize the tree (defined by the pointers "root", "left", and "right") to be empty. Next, if you're being asked to insert a string into the tree, that means to start by trying to insert it at the root. Next, if you're being asked to insert a string "y" at or below the node "x", and "x" is a string, then there are three subcases: a) Maybe they are the same string ("x === y"), in which case "y" is already in the tree. In this case just return the node that stores "y" (which happens to be the string itself). (What we choose to return here is actually completely arbitrary, because the insertion is done at this point.) b) If the strings aren't equal, then maybe string "x" is less than string "y" ("OrderedQ[{x,y}]"). We have to use "OrderedQ" to compare the strings because Mma's less-than operator "<" only wants to compare numeric entities. If "x" is less than "y", then "y" should be inserted in the "right" subtree of "x". We recurse. c) If the strings aren't equal, and "x" isn't less than "y", then "x" must be greater than "y", so "y" belongs in the "left" subtree of "x". Now, there are two ways to break out of the recursion in the second rule for "insert". If a string being inserted ("y") is already in the tree, then eventually we will break out in subcase (a) above. But if the string is not in the tree, then eventually we will reach a leaf node in the recursions through cases (b) and (c). What is a leaf node? It's a node that lacks a subtree -- in other words, its "left" and/or "right" pointers are undefined. Say for concreteness that "x" is the string "foo" and "y" is the string "bar", so we are trying to evaluate insert["foo","bar"] and say that left["foo"] is undefined. The second rule on "insert" will rewrite insert["foo","bar"] into insert[left["foo"],"bar"], but since left["foo"] is undefined it won't rewrite. This keeps the second rule on "insert" from firing again, because it is restricted to fire only when its first argument has head "String", and the expression left["foo"] has head "left". So we drop through to the third rule, which says that if you have "insert[x,y]" and "x" is not a string, rewrite it to "x = y". In this case, that becomes left["foo"] = "bar" which does exactly the right thing -- it installs "bar" as the root node of the left subtree of "foo". As a test, try this: insert /@ {"this", "is", "a", "test", "of", "this", "insertion", "sort"} Mma returns the same list (because we return the value of the inserted string in subcase (a) of the second rule and in the third rule, which are the two places where we break out of the recursion on "insert"). But if we look at the "left" and "right" and "root" pointers we see this: In[6]:= ?left left left/: left["this"] = "is" left/: left["is"] = "a" left/: left["test"] = "of" In[6]:= ?right right right/: right["is"] = "test" right/: right["a"] = "insertion" right/: right["of"] = "sort" In[6]:= ?root root root = this (NOTE: a bug in Mma V1.2 causes the string quotes around "this" in the output from "?root" to be omitted. This is fixed in V2.0.) To print out the list in sorted order, we want to traverse this tree in left-center-right order, printing nodes as we come to them. Here's a function that does this: LCR[s_String] := { LCR[left[s]], s, LCR[right[s]] } LCR[_] := {} sortedlist := Flatten[LCR[root]] If a node has a value (i.e., if it is a string) then return that string, preceded by its left subtree and followed by its right subtree. If it has no value (i.e., if it is an expression like left["foo"] that doesn't rewrite, so its head is not "String"), return nothing. To print the list in sorted order, we do this: In[9]:= sortedlist Out[9]= {a, insertion, is, of, sort, test, this} It works! Note that "this" was stored only once. Exercise: modify the above so that the elements of the list returned by "sortedlist" are not just the strings, but lists of the form {String,Integer} where the integer counts the number of occurrences of the string. The output of your modified program on the same input should be In[39]:= sortedlist Out[39]= {{a, 1}, {insertion, 1}, {is, 1}, {of, 1}, {sort, 1}, > {test, 1}, {this, 2}} (Hint: the first thing I thought of was to make the "nodes" be lists. This is the hard way. There's a much simpler, more elegant solution.) A historical comment: I doubt that many people today know much about the programming language SNOBOL4, which is about twenty-five years old now and yet is still in many ways one of the most advanced languages ever developed. But if you do have some knowledge of SNOBOL4, then you have a big advantage in working with Mma, because much of the conceptual framework that you need is the same. Both languages rely heavily on pattern-matching to drive the flow of control, for example. And both languages build up data structures via pointers in the same way: by defining "pointer functions". BTW, this was only an incidental point in the "Intro to Mma..." talk I referred to above. The main idea of the presentation was to assume that my listeners knew what programming was about and had done some procedural programming, and to explain the two new programming paradigms (functional programming, and programming by pattern-matching/term-rewriting) that Mma offers, and how they can lead to programs that are easier to write and maintain and that run more efficiently than procedural programs. The notes from this talk are available in the collection "1991 Mma Conference -- Elementary Tutorial Notes", which you can order from WRI. I feel free to say this because I don't get a penny if you do order the notes; I mention it only as a public service for those who might be interested. I'd be happy to put the full notes from my talk on an FTP server somewhere if I didn't think that would upset WRI (since they are trying to sell them). If you order the collection, you do get all the notes from all the elementary tutorials, and there were many of them and they were very good. Hope this helps. --Cameron Smith Mathematica maven and free-lance consultant cameron at midd.cc.middlebury.edu