String patterns
- To: mathgroup@smc.vnet.net
- Subject: [mg10858] String patterns
- From: evans.nospam@gte.net (MJE)
- Date: Wed, 11 Feb 1998 18:32:28 -0500
- Organization: None
Programming challenge: Is there an elegant means of doing cryptanalysis in Mathematica as opposed to any other language. I am mainly thinking of pattern-matching functions. In this case, the pattern would be dynamic, not predefined. I am not certain how to create and test patterns on the fly. The primary task is to count letter, digraph, trigraph, and higher-order frequencies. Output for the trigraph case might look like this: THE 0.01350000 AND 0.00709421 ION 0.00559429 ING 0.00510783 TIO 0.00466191 ENT 0.00458083 RES 0.00417545 <...etc....> BEP 0.00004054 The real number represents the fractional occurrence of the trigraph among all trigraphs in the sample. These were computed by a DOS utility on a particular sample text. The word "the" occurred 333 times out of 24668 total trigraph sequences, giving an estimated probability for this trigraph of 333/24668=0.01350000. Trigraphs overlap. If I parse the following phrase, "I love Mathematica" then the first trigraph is "I l" (spaces count), the second is " lo", and the third is "lov". One must define an "alphabet" with a sorting order. A good way to do this is with a string variable like this: "abcdefghijklmno..." How good is Mathematica at this kind of string manipultion and searching? Mark Evans evans@gte.net