Services & Resources / Wolfram Forums
-----
 /
MathGroup Archive
1998
*January
*February
*March
*April
*May
*June
*July
*August
*September
*October
*November
*December
*Archive Index
*Ask about this page
*Print this page
*Give us feedback
*Sign up for the Wolfram Insider

MathGroup Archive 1998

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Re: String patterns



A handful of people, including Daniel Lichtblau, have been very helpful
in answering my question.  This input is appreciated.

Strangely, no one has suggested a different line of thought than string
functions.  There is at least one programming language that is much
better suited to string functions than Mathematica (Icon: 
http://www.cs.arizona.edu/icon/index.htm).  Icon has some unique
features (generators, success/failure propagation) that surpass even
Mathematica for ease of expression and programming flexibility.  I had
a different notion for Mathematica that was not made explicit in my
post.

I was expecting someone to come up with the idea of turning the
characters into numbers using their ASCII codes, or perhaps two-byte
"wide character" codes.  This conversion leaves you with a list of
integer numbers.  I would expect Mathematica to be much more competent
with this data structure than with strings.

Mark


Daniel Lichtblau wrote:
> 
> MJE wrote:
> >
> > Programming challenge:
> >
> > Is there an elegant means of doing cryptanalysis in Mathematica as
> > opposed to any other language.  I am mainly thinking of
> > pattern-matching functions.  In this case, the pattern would be
> > dynamic, not predefined.  I am not certain how to create and test
> > patterns on the fly.
> >
> > The primary task is to count letter, digraph, trigraph, and higher-order
> > frequencies.
> >
> > Output for the trigraph case might look like this:
> >
> > THE    0.01350000
> > AND    0.00709421
> > ION    0.00559429
> > ING    0.00510783
> > TIO    0.00466191
> > ENT    0.00458083
> > RES    0.00417545
> >    <...etc....>
> > BEP    0.00004054
> >
> > The real number represents the fractional occurrence of the trigraph
> > among all trigraphs in the sample.  These were computed by a DOS
> > utility on a particular sample text.  The word "the" occurred 333 times
> > out of 24668 total trigraph sequences, giving an estimated probability
> > for this trigraph of 333/24668=0.01350000.
> >
> > Trigraphs overlap.  If I parse the following phrase,
> >
> >      "I love Mathematica"
> >
> > then the first trigraph is "I l" (spaces count), the second is " lo",
> > and the third is "lov".
> >
> > One must define an "alphabet" with a sorting order.  A good way to do
> > this is with a string variable like this:
> >
> >      "abcdefghijklmno..."
> >
> > How good is Mathematica at this kind of string manipultion and
> > searching?
> >
> > Mark Evans
> > evans@gte.net
> 
> To find frequencies of a small set of given trigraphs you might use
> StringPosition.
> 
> In[23]:= str = "I love Mathematica because it has Mathieu functions,
> matrix operations, and pattern matching.";
> 
> In[24]:= strL = ToLowerCase[str];
> General::spell1:
>    Possible spelling error: new symbol name "strL"
>      is similar to existing symbol "str".
> 
> In[26]:= Length[StringPosition[strL, "mat"]] Out[26]= 5
> 
> To check frequencies of all triads that occur in your string you first
> might form the triads explicitly, as below.
> 
> triads = Union[Table[StringTake[strL, {j,j+2}], {j,StringLength[strL]-2
> }]];
> 
> Then you could do
> 
> In[51]:= Map[Length[StringPosition[strL,#]]&, triads]
> Out[51]= {1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1,
> >    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 4, 5, 1,
> >    1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2,
> >    1, 1, 1, 1, 1, 1, 1}
> 
> If you are working with large strings and many triads, a more efficient
> method might be to initialize a set of function values, one entry per
> triad, to zeroes. For example,
> 
> In[54]:= Do[freq[triads[[j]]] = 0, {j,Length[triads]}]
> 
> In[56]:= ?freq
> Global`freq
> freq[", a"] = 0
> freq["a b"] = 0
> ...
> 
> Then iterate over the string, and for each triad you find increment the
> appropriate function value. Takes a bit of coding (not too much) but
> should be reasonably fast.
> 
> Daniel Lichtblau
> Wolfram Research




  • Prev by Date: Re: How do you draw Triangles and other Geometric figures in Mathematica?
  • Next by Date: No Subject
  • Prev by thread: Re: String patterns
  • Next by thread: Fonts in graphics