MathGroup Archive: February 1991 [00024]

[Date Index] [Thread Index] [Author Index]
Re: data structure equivalent ops
To: mathgroup at yoda.ncsa.uiuc.edu
Subject: Re: data structure equivalent ops
From: CAMERON at midd.cc.middlebury.edu
Date: Fri, 22 Feb 91 17:13 EST
David Doll says:

> Hello, quick question: I was wondering about how you would do data
> structure operations like insert-sorted, delete, find, etc.,
> double(single) link list, a priority queue, binary tree ops -
> insert, delete, traversal, etc., stack ops - pushd, pop, etc. I'm
> about to give a quick talk to a few people in my research lab about
> the thought of using Mathematica  and one of them mentioned it
> would helpfull to explain the difference between doing some of the
> above mentioned in "C" and in Mathematica.

Well, it's not a "quick question" once I've gotten my hands on it! :-)
This is a long posting, but it answers the question fairly thoroughly.

I was one of the presenters at the 2nd annual Mma Users' Conference
in San Francisco last January.  One of my presentations was a 90-minute
talk called "Introduction to Mma for Users of FORTRAN and C", but I
didn't actually rely on any knowledge of those languages, and the title
"Introduction to Mma for Users Traditional Procedural Languages" would
have been equally apt.  The use of pointers in Mma to build up classical
data structures like the ones you mention is one of the topics I covered.
The example I used in the notes for that talk is insertion sort with
a binary tree.  Here it is, in revised form.

-----

The one-line answer: store pointers in functions.

First of all, you must shed (if you've acquired it) the notion that
Mma "functions" are "programs".  In classical procedural languages
there is a sharp distinction between executable entities in a
program (functions and procedures and subroutines) and data objects
(variables, arrays, pointers).  There is no such division in Mma --
programs are data.  If you insist on thinking of a statement like

   f[x_] := x^2 + x

only as a "program" that accepts "input" x and computes the "output"
x^2 + x  then you may miss the insight that you can store data in functions.
This statement actually stores a rewriting rule (which is a data object) in
the global rule base, namely the rule "f[x_] :> x^2 + 2".  You can
think of this as specifying the "value" of the "pointer named f"
at any "node named x".  Similarly, the statements

   left[a] = b;
   right[a] = c;

can be thought of as specifying that the value of the "left" pointer
at "node a" is "b", and the value of the "right" pointer is "c"; they
store the rules "left[a] -> b" and "right[a] -> c" in the global rule
base.  If you have many other rewriting rules on "left" and "right",
such as "left[x] -> y" and "right[u] -> v", this amounts to a data
structure built out of the nodes "{a,b,c,...,u,v,...,x,y,...}" and
the pointers "left" and "right".  This could be a binary tree
(for example).

There is only one difference between this and the usual way of
constructing a data structure from pointers in languages like C
and Pascal, and it's this:  in C you'd declare a "struct" type
(in Pascal it'd be a "record") that contained two "fields" named
"left" and "right", which were pointers to this same type of "struct".
There would be an instance of this "struct" for each node.  Thus
the information about "left" and "right" subtrees of a node would
be physically stored in the data structure that represents the node.
In Mma, the information about ALL "left" subtrees is physically stored
in memory under the "pointer name" "left", instead of being distributed
among the nodes.  Similarly the information about all "right" subtrees
is stored with the name "right".  But who cares whether the value
of "right[a]" is stored with "right" or with "a"?  For most purposes,
it doesn't matter.

You can tell whether a pointer is defined or not by whether it
rewrites.  For example, suppose your "nodes" are Mma lists that
store several items of information.  Suppose you have a binary tree
of these nodes, and the pointers are "left" and "right".  Suppose
the variable "x" in a procedure represents one of the nodes in the
tree, and you want to know whether "x" has a left subtree or not.
In C or Pascal, you would ask whether "x.left" is a null pointer.
In Mma, you would simply ask whether "left[x]" has head "left"!
After all, if "x" has a left subtree, then "left[x]" will evaluate
to the node that is the root of that subtree, and (since we are
storing nodes as lists) the expression "Head[left[x]]" will be
"List".  But if "x" has no left subtree, then "left[x]" won't have
a rewriting rule associated with it, so it won't rewrite, so
"Head[left[x]]" will be "left".  This also shows how to clear a
pointer:  just say "left[x] = ." (or, equivalently, "Unset[left[x]]",
which is what that is an abbreviation for).


With all that in mind, here's an insertion sort in Mma.  In this
case we're building a binary tree whose nodes are strings.

   ClearAll[ root, left, right ]  (* these are our "pointers" *)

   insert[s_String] := insert[root,s]

   insert[ x_String, y_ ] := Which[
      x === y,         x,
      OrderedQ[{x,y}], insert[ right[x], y ],
      True,            insert[  left[x], y ]
   ]

   insert[ x_, y_ ] :=  x = y

That's it!  The rules, in order, say this:  First, initialize the tree
(defined by the pointers "root", "left", and "right") to be empty.
Next, if you're being asked to insert a string into the tree,
that means to start by trying to insert it at the root.
Next, if you're being asked to insert a string "y" at or below
the node "x", and "x" is a string, then there are three subcases:

  a) Maybe they are the same string ("x === y"),
     in which case "y" is already in the tree.
     In this case just return the node that stores "y"
     (which happens to be the string itself).
     (What we choose to return here is actually completely
     arbitrary, because the insertion is done at this point.)

  b) If the strings aren't equal, then maybe string "x" is
     less than string "y" ("OrderedQ[{x,y}]").  We have to
     use "OrderedQ" to compare the strings because Mma's
     less-than operator "<" only wants to compare numeric
     entities.  If "x" is less than "y", then "y" should
     be inserted in the "right" subtree of "x".  We recurse.

  c) If the strings aren't equal, and "x" isn't less than "y",
     then "x" must be greater than "y", so "y" belongs in the
     "left" subtree of "x".

Now, there are two ways to break out of the recursion in the second
rule for "insert".  If a string being inserted ("y") is already in
the tree, then eventually we will break out in subcase (a) above.
But if the string is not in the tree, then eventually we will reach
a leaf node in the recursions through cases (b) and (c).  What is
a leaf node?  It's a node that lacks a subtree -- in other words,
its "left" and/or "right" pointers are undefined.  Say for concreteness
that "x" is the string "foo" and "y" is the string "bar", so we
are trying to evaluate

   insert["foo","bar"]

and say that  left["foo"]  is undefined.  The second rule on "insert"
will rewrite  insert["foo","bar"]  into  insert[left["foo"],"bar"],
but since  left["foo"]  is undefined it won't rewrite.  This keeps
the second rule on "insert" from firing again, because it is
restricted to fire only when its first argument has head "String",
and the expression  left["foo"]  has head "left".  So we drop through
to the third rule, which says that if you have "insert[x,y]" and
"x" is not a string, rewrite it to "x = y".  In this case, that
becomes

   left["foo"] = "bar"

which does exactly the right thing -- it installs "bar" as the root
node of the left subtree of "foo".

As a test, try this:

   insert /@  {"this", "is", "a", "test", "of", "this", "insertion", "sort"}

Mma returns the same list (because we return the value of the inserted
string in subcase (a) of the second rule and in the third rule, which
are the two places where we break out of the recursion on "insert").
But if we look at the "left" and "right" and "root" pointers we see
this:

    In[6]:= ?left
    left
    left/: left["this"] = "is"

    left/: left["is"] = "a"

    left/: left["test"] = "of"


    In[6]:= ?right
    right
    right/: right["is"] = "test"

    right/: right["a"] = "insertion"

    right/: right["of"] = "sort"


    In[6]:= ?root
    root
    root = this

(NOTE: a bug in Mma V1.2 causes the string quotes around "this"
in the output from "?root" to be omitted.  This is fixed in V2.0.)

To print out the list in sorted order, we want to traverse this
tree in left-center-right order, printing nodes as we come to
them.  Here's a function that does this:

    LCR[s_String] := { LCR[left[s]], s, LCR[right[s]] }

    LCR[_] := {}

    sortedlist := Flatten[LCR[root]]

If a node has a value (i.e., if it is a string) then return
that string, preceded by its left subtree and followed by its
right subtree.  If it has no value (i.e., if it is an expression
like  left["foo"]  that doesn't rewrite, so its head is not "String"),
return nothing.  To print the list in sorted order, we do this:

    In[9]:= sortedlist

    Out[9]= {a, insertion, is, of, sort, test, this}

It works!  Note that "this" was stored only once.  Exercise:
modify the above so that the elements of the list returned
by "sortedlist" are not just the strings, but lists
of the form {String,Integer} where the integer counts the number
of occurrences of the string.  The output of your modified
program on the same input should be

    In[39]:= sortedlist

    Out[39]= {{a, 1}, {insertion, 1}, {is, 1}, {of, 1}, {sort, 1},

    >    {test, 1}, {this, 2}}

(Hint: the first thing I thought of was to make the "nodes" be lists.
This is the hard way.  There's a much simpler, more elegant solution.)

A historical comment: I doubt that many people today know much about
the programming language SNOBOL4, which is about twenty-five years old
now and yet is still in many ways one of the most advanced languages
ever developed.  But if you do have some knowledge of SNOBOL4, then
you have a big advantage in working with Mma, because much of the
conceptual framework that you need is the same.  Both languages
rely heavily on pattern-matching to drive the flow of control, for
example.  And both languages build up data structures via pointers
in the same way: by defining "pointer functions".

BTW, this was only an incidental point in the "Intro to Mma..."
talk I referred to above.  The main idea of the presentation was
to assume that my listeners knew what programming was about and
had done some procedural programming, and to explain the two new
programming paradigms (functional programming, and programming by
pattern-matching/term-rewriting) that Mma offers, and how they can
lead to programs that are easier to write and maintain and that
run more efficiently than procedural programs.  The notes from this
talk are available in the collection "1991 Mma Conference -- Elementary
Tutorial Notes", which you can order from WRI.  I feel free to say
this because I don't get a penny if you do order the notes; I mention
it only as a public service for those who might be interested.
I'd be happy to put the full notes from my talk on an FTP server
somewhere if I didn't think that would upset WRI (since they are
trying to sell them).  If you order the collection, you do get all
the notes from all the elementary tutorials, and there were many
of them and they were very good.

Hope this helps.

--Cameron Smith
  Mathematica maven and free-lance consultant
  cameron at midd.cc.middlebury.edu
Prev by Date: Re: data structure equivalent ops
Next by Date: psfix for Mathematica 2.0
Previous by thread: Re: data structure equivalent ops
Next by thread: New Message and Files