[Date Index]
[Thread Index]
[Author Index]
# Re: String patterns
*To*: mathgroup@smc.vnet.net
*Subject*: [mg10943] Re: String patterns
*From*: Daniel Lichtblau <danl@wolfram.com>
*Date*: Sat, 14 Feb 1998 00:53:17 -0500
*Organization*: Wolfram Research, Inc.
*References*: <6brf6t$e78@smc.vnet.net>
MJE wrote:
>
> Programming challenge:
>
> Is there an elegant means of doing cryptanalysis in Mathematica as
> opposed to any other language. I am mainly thinking of
> pattern-matching functions. In this case, the pattern would be
> dynamic, not predefined. I am not certain how to create and test
> patterns on the fly.
>
> The primary task is to count letter, digraph, trigraph, and higher-order
> frequencies.
>
> Output for the trigraph case might look like this:
>
> THE 0.01350000
> AND 0.00709421
> ION 0.00559429
> ING 0.00510783
> TIO 0.00466191
> ENT 0.00458083
> RES 0.00417545
> <...etc....>
> BEP 0.00004054
>
> The real number represents the fractional occurrence of the trigraph
> among all trigraphs in the sample. These were computed by a DOS
> utility on a particular sample text. The word "the" occurred 333 times
> out of 24668 total trigraph sequences, giving an estimated probability
> for this trigraph of 333/24668=0.01350000.
>
> Trigraphs overlap. If I parse the following phrase,
>
> "I love Mathematica"
>
> then the first trigraph is "I l" (spaces count), the second is " lo",
> and the third is "lov".
>
> One must define an "alphabet" with a sorting order. A good way to do
> this is with a string variable like this:
>
> "abcdefghijklmno..."
>
> How good is Mathematica at this kind of string manipultion and
> searching?
>
> Mark Evans
> evans@gte.net
To find frequencies of a small set of given trigraphs you might use
StringPosition.
In[23]:= str = "I love Mathematica because it has Mathieu functions,
matrix operations, and pattern matching.";
In[24]:= strL = ToLowerCase[str];
General::spell1:
Possible spelling error: new symbol name "strL"
is similar to existing symbol "str".
In[26]:= Length[StringPosition[strL, "mat"]] Out[26]= 5
To check frequencies of all triads that occur in your string you first
might form the triads explicitly, as below.
triads = Union[Table[StringTake[strL, {j,j+2}], {j,StringLength[strL]-2
}]];
Then you could do
In[51]:= Map[Length[StringPosition[strL,#]]&, triads]
Out[51]= {1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 4, 5, 1,
> 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2,
> 1, 1, 1, 1, 1, 1, 1}
If you are working with large strings and many triads, a more efficient
method might be to initialize a set of function values, one entry per
triad, to zeroes. For example,
In[54]:= Do[freq[triads[[j]]] = 0, {j,Length[triads]}]
In[56]:= ?freq
Global`freq
freq[", a"] = 0
freq["a b"] = 0
...
Then iterate over the string, and for each triad you find increment the
appropriate function value. Takes a bit of coding (not too much) but
should be reasonably fast.
Daniel Lichtblau
Wolfram Research
Prev by Date:
**Re: question on ErrorBar**
Next by Date:
**Re: Active Plots**
Prev by thread:
**Re: String patterns**
Next by thread:
**Re: Re: String patterns**
| |