Re: String patterns
- To: mathgroup@smc.vnet.net
- Subject: [mg10943] Re: String patterns
- From: Daniel Lichtblau <danl@wolfram.com>
- Date: Sat, 14 Feb 1998 00:53:17 -0500
- Organization: Wolfram Research, Inc.
- References: <6brf6t$e78@smc.vnet.net>
MJE wrote:
>
> Programming challenge:
>
> Is there an elegant means of doing cryptanalysis in Mathematica as
> opposed to any other language. I am mainly thinking of
> pattern-matching functions. In this case, the pattern would be
> dynamic, not predefined. I am not certain how to create and test
> patterns on the fly.
>
> The primary task is to count letter, digraph, trigraph, and higher-order
> frequencies.
>
> Output for the trigraph case might look like this:
>
> THE 0.01350000
> AND 0.00709421
> ION 0.00559429
> ING 0.00510783
> TIO 0.00466191
> ENT 0.00458083
> RES 0.00417545
> <...etc....>
> BEP 0.00004054
>
> The real number represents the fractional occurrence of the trigraph
> among all trigraphs in the sample. These were computed by a DOS
> utility on a particular sample text. The word "the" occurred 333 times
> out of 24668 total trigraph sequences, giving an estimated probability
> for this trigraph of 333/24668=0.01350000.
>
> Trigraphs overlap. If I parse the following phrase,
>
> "I love Mathematica"
>
> then the first trigraph is "I l" (spaces count), the second is " lo",
> and the third is "lov".
>
> One must define an "alphabet" with a sorting order. A good way to do
> this is with a string variable like this:
>
> "abcdefghijklmno..."
>
> How good is Mathematica at this kind of string manipultion and
> searching?
>
> Mark Evans
> evans@gte.net
To find frequencies of a small set of given trigraphs you might use
StringPosition.
In[23]:= str = "I love Mathematica because it has Mathieu functions,
matrix operations, and pattern matching.";
In[24]:= strL = ToLowerCase[str];
General::spell1:
Possible spelling error: new symbol name "strL"
is similar to existing symbol "str".
In[26]:= Length[StringPosition[strL, "mat"]] Out[26]= 5
To check frequencies of all triads that occur in your string you first
might form the triads explicitly, as below.
triads = Union[Table[StringTake[strL, {j,j+2}], {j,StringLength[strL]-2
}]];
Then you could do
In[51]:= Map[Length[StringPosition[strL,#]]&, triads]
Out[51]= {1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 4, 5, 1,
> 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2,
> 1, 1, 1, 1, 1, 1, 1}
If you are working with large strings and many triads, a more efficient
method might be to initialize a set of function values, one entry per
triad, to zeroes. For example,
In[54]:= Do[freq[triads[[j]]] = 0, {j,Length[triads]}]
In[56]:= ?freq
Global`freq
freq[", a"] = 0
freq["a b"] = 0
...
Then iterate over the string, and for each triad you find increment the
appropriate function value. Takes a bit of coding (not too much) but
should be reasonably fast.
Daniel Lichtblau
Wolfram Research
- Follow-Ups:
- Re: Re: String patterns
- From: Mark Evans <evans.nospam@gte.net>
- Re: Re: String patterns