String patterns
- To: mathgroup@smc.vnet.net
- Subject: [mg10858] String patterns
- From: evans.nospam@gte.net (MJE)
- Date: Wed, 11 Feb 1998 18:32:28 -0500
- Organization: None
Programming challenge:
Is there an elegant means of doing cryptanalysis in Mathematica as
opposed to any other language. I am mainly thinking of
pattern-matching functions. In this case, the pattern would be
dynamic, not predefined. I am not certain how to create and test
patterns on the fly.
The primary task is to count letter, digraph, trigraph, and higher-order
frequencies.
Output for the trigraph case might look like this:
THE 0.01350000
AND 0.00709421
ION 0.00559429
ING 0.00510783
TIO 0.00466191
ENT 0.00458083
RES 0.00417545
<...etc....>
BEP 0.00004054
The real number represents the fractional occurrence of the trigraph
among all trigraphs in the sample. These were computed by a DOS
utility on a particular sample text. The word "the" occurred 333 times
out of 24668 total trigraph sequences, giving an estimated probability
for this trigraph of 333/24668=0.01350000.
Trigraphs overlap. If I parse the following phrase,
"I love Mathematica"
then the first trigraph is "I l" (spaces count), the second is " lo",
and the third is "lov".
One must define an "alphabet" with a sorting order. A good way to do
this is with a string variable like this:
"abcdefghijklmno..."
How good is Mathematica at this kind of string manipultion and
searching?
Mark Evans
evans@gte.net