# Re: String patterns

• To: mathgroup@smc.vnet.net
• Subject: [mg10943] Re: String patterns
• From: Daniel Lichtblau <danl@wolfram.com>
• Date: Sat, 14 Feb 1998 00:53:17 -0500
• Organization: Wolfram Research, Inc.
• References: <6brf6t\$e78@smc.vnet.net>

```MJE wrote:
>
> Programming challenge:
>
> Is there an elegant means of doing cryptanalysis in Mathematica as
> opposed to any other language.  I am mainly thinking of
> pattern-matching functions.  In this case, the pattern would be
> dynamic, not predefined.  I am not certain how to create and test
> patterns on the fly.
>
> The primary task is to count letter, digraph, trigraph, and higher-order
> frequencies.
>
> Output for the trigraph case might look like this:
>
> THE    0.01350000
> AND    0.00709421
> ION    0.00559429
> ING    0.00510783
> TIO    0.00466191
> ENT    0.00458083
> RES    0.00417545
>    <...etc....>
> BEP    0.00004054
>
> The real number represents the fractional occurrence of the trigraph
> among all trigraphs in the sample.  These were computed by a DOS
> utility on a particular sample text.  The word "the" occurred 333 times
> out of 24668 total trigraph sequences, giving an estimated probability
> for this trigraph of 333/24668=0.01350000.
>
> Trigraphs overlap.  If I parse the following phrase,
>
>      "I love Mathematica"
>
> then the first trigraph is "I l" (spaces count), the second is " lo",
> and the third is "lov".
>
> One must define an "alphabet" with a sorting order.  A good way to do
> this is with a string variable like this:
>
>      "abcdefghijklmno..."
>
> How good is Mathematica at this kind of string manipultion and
> searching?
>
> Mark Evans
> evans@gte.net

To find frequencies of a small set of given trigraphs you might use
StringPosition.

In[23]:= str = "I love Mathematica because it has Mathieu functions,
matrix operations, and pattern matching.";

In[24]:= strL = ToLowerCase[str];
General::spell1:
Possible spelling error: new symbol name "strL"
is similar to existing symbol "str".

In[26]:= Length[StringPosition[strL, "mat"]] Out[26]= 5

To check frequencies of all triads that occur in your string you first
might form the triads explicitly, as below.

triads = Union[Table[StringTake[strL, {j,j+2}], {j,StringLength[strL]-2
}]];

Then you could do

Out[51]= {1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1,
>    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 4, 5, 1,
>    1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2,
>    1, 1, 1, 1, 1, 1, 1}

If you are working with large strings and many triads, a more efficient
method might be to initialize a set of function values, one entry per
triad, to zeroes. For example,

In[56]:= ?freq
Global`freq
freq[", a"] = 0
freq["a b"] = 0
...

Then iterate over the string, and for each triad you find increment the
appropriate function value. Takes a bit of coding (not too much) but
should be reasonably fast.

Daniel Lichtblau
Wolfram Research

```

• Prev by Date: Re: question on ErrorBar
• Next by Date: Re: Active Plots
• Prev by thread: Re: String patterns
• Next by thread: Re: Re: String patterns