MathGroup Archive 2006

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Counting Symbols

  • To: mathgroup at smc.vnet.net
  • Subject: [mg70897] Re: Counting Symbols
  • From: bghiggins at ucdavis.edu
  • Date: Wed, 1 Nov 2006 03:55:07 -0500 (EST)
  • References: <ei4lrs$djv$1@smc.vnet.net>

RM,

Have you thought about using the J/Link options in Mathematica? I am
assuming you have a way to capture keystrokes in a document and these
keystrokes have a unicode equivalent If that is the case then try this:

Needs["JLink`"]

InstallJava[]

Suppose now the file(s) are located in a given directory

filepath = "/ToMydirectory/testdata.txt";

First, we need to create byte array object for our data. Normally we
make this byte array much larger than the actual size we require. The
byte array will be a buffer :

 buf = JavaNew["[B", 1000]

Then we create a FileInputStream object which opens a connection to the
 datafile


fis=JavaNew["java.io.FileInputStream",filepath]

Now let us read the data into the buffer, the output of which is the
size of the buffer


fis[read[buf]]

527

To examine the contents of our buffer array we need to convert it back
to a Mathematica expression using the JavaObjectToExpression function.

bufdata = Take[JavaObjectToExpression[buf], 527]

{83, 111, 32, 115, 104, 101, 32, 119, 101, 110, 116, 32, 105, 110, 116,
111, \
32, 116, 104, 101, 32, 103, 97, 114, 100, 101, 110, 32, 116, 111, 32,
99, \
117, 116, 32, 97, 32, 99, 97, 98, 98, 97, 103, 101, 45, 108, 101, 97,
102, \
44, 32, 116, 111, 10, 109, 97, 107, 101, 32, 97, 110, 32, 97, 112, 112,
108, \
101, 45, 112, 105, 101, 59, 32, 97, 110, 100, 32, 97, 116, 32, 116,
104, 101, \
32, 115, 97, 109, 101, 32, 116, 105, 109, 101, 32, 97, 32, 103, 114,
101, 97, \
116, 10, 115, 104, 101, 45, 98, 101, 97, 114, 44, 32, 99, 111, 109,
105, 110, \
103, 32, 117, 112, 32, 116, 104, 101, 32, 115, 116, 114, 101, 101, 116,
44, \
32, 112, 111, 112, 115, 32, 105, 116, 115, 32, 104, 101, 97, 100, 32,
105, \
110, 116, 111, 32, 116, 104, 101, 10, 115, 104, 111, 112, 46, 32, 39,
87, \
104, 97, 116, 33, 32, 110, 111, 32, 115, 111, 97, 112, 63, 39, 32, 83,
111, \
32, 104, 101, 32, 100, 105, 101, 100, 44, 32, 97, 110, 100, 32, 115,
104, \
101, 32, 118, 101, 114, 121, 10, 105, 109, 112, 114, 117, 100, 101,
110, 116, \
108, 121, 32, 109, 97, 114, 114, 105, 101, 100, 32, 116, 104, 101, 32,
98, \
97, 114, 98, 101, 114, 59, 32, 97, 110, 100, 32, 116, 104, 101, 114,
101, 32, \
119, 101, 114, 101, 10, 112, 114, 101, 115, 101, 110, 116, 32, 116,
104, 101, \
32, 80, 105, 99, 110, 105, 110, 110, 105, 101, 115, 44, 32, 97, 110,
100, 32, \
116, 104, 101, 32, 74, 111, 98, 108, 105, 108, 108, 105, 101, 115, 44,
32, \
97, 110, 100, 32, 116, 104, 101, 10, 71, 97, 114, 121, 97, 108, 105,
101, \
115, 44, 32, 97, 110, 100, 32, 116, 104, 101, 32, 103, 114, 97, 110,
100, 32, \
80, 97, 110, 106, 97, 110, 100, 114, 117, 109, 32, 104, 105, 109, 115,
101, \
108, 102, 44, 32, 119, 105, 116, 104, 32, 116, 104, 101, 10, 108, 105,
116, \
116, 108, 101, 32, 114, 111, 117, 110, 100, 32, 98, 117, 116, 116, 111,
110, \
32, 97, 116, 32, 116, 111, 112, 44, 32, 97, 110, 100, 32, 116, 104,
101, 121, \
32, 97, 108, 108, 32, 102, 101, 108, 108, 32, 116, 111, 32, 112, 108,
97, \
121, 105, 110, 103, 10, 116, 104, 101, 32, 103, 97, 109, 101, 32, 111,
102, \
32, 99, 97, 116, 99, 104, 32, 97, 115, 32, 99, 97, 116, 99, 104, 32,
99, 97, \
110, 44, 32, 116, 105, 108, 108, 32, 116, 104, 101, 32, 103, 117, 110,
32, \
112, 111, 119, 100, 101, 114, 32, 114, 97, 110, 10, 111, 117, 116, 32,
97, \
116, 32, 116, 104, 101, 32, 104, 101, 101, 108, 115, 32, 111, 102, 32,
116, \
104, 101, 105, 114, 32, 98, 111, 111, 116, 115, 46, 10, 10, 83, 97,
109, 117, \
101, 108, 32, 70, 111, 111, 116, 101, 32, 49, 55, 50, 48, 45, 49, 55,
55, 55}

If we convert these bytes back the characters, we get the data that was
read from the file!


FromCharacterCode[bufdata]


So she went into the garden to cut a cabbage-leaf, to
make an apple-pie; and at the same time a great
she-bear, coming up the street, pops its head into the
shop. 'What! no soap?' So he died, and she very
imprudently married the barber; and there were
present the Picninnies, and the Joblillies, and the
Garyalies, and the grand Panjandrum himself, with the
little round button at top, and they all fell to playing
the game of catch as catch can, till the gun powder ran
out at the heels of their boots.

Samuel Foote 1720-1777

Knowing what charactercode you want to look for in the list bufdata
you can then use a variety of Mathemnatica functions to do the task. I
think this approach might be a lot cleaner than pulling the document
into Mathematica as a notebook object.

Brian



R M wrote:
> I have a whole bunch of documents from which I would like to figure out what keystrokes/symbols are used the most.  What commands should I use to figure this out?  I have (unsuccessfully) tried to use Flatten and Short.  My approach is to pull the document into Mathematica as a notebook object, then use Mathematica's list manipulation commands to figure out what symbols are used most frequently.
>
> This whole problem involves my recent purchase of a Logic Controls programmable keyboard.  So far I have programmed keys to perform [esc]sumt[esc], [downarrow]alt+5, cntrl+space, cntrl+9, etc.  I am trying to figure out which symbols are used the most so that I can program the keyboard most efficiently.


  • Prev by Date: Re: new procedure for converting a new recursive polynomial set into matrices
  • Next by Date: Re: Re: Searching for a function
  • Previous by thread: Re: Counting Symbols
  • Next by thread: Re: Expanding logical expressions