Re: String patterns
- To: mathgroup@smc.vnet.net
- Subject: [mg10943] Re: String patterns
- From: Daniel Lichtblau <danl@wolfram.com>
- Date: Sat, 14 Feb 1998 00:53:17 -0500
- Organization: Wolfram Research, Inc.
- References: <6brf6t$e78@smc.vnet.net>
MJE wrote: > > Programming challenge: > > Is there an elegant means of doing cryptanalysis in Mathematica as > opposed to any other language. I am mainly thinking of > pattern-matching functions. In this case, the pattern would be > dynamic, not predefined. I am not certain how to create and test > patterns on the fly. > > The primary task is to count letter, digraph, trigraph, and higher-order > frequencies. > > Output for the trigraph case might look like this: > > THE 0.01350000 > AND 0.00709421 > ION 0.00559429 > ING 0.00510783 > TIO 0.00466191 > ENT 0.00458083 > RES 0.00417545 > <...etc....> > BEP 0.00004054 > > The real number represents the fractional occurrence of the trigraph > among all trigraphs in the sample. These were computed by a DOS > utility on a particular sample text. The word "the" occurred 333 times > out of 24668 total trigraph sequences, giving an estimated probability > for this trigraph of 333/24668=0.01350000. > > Trigraphs overlap. If I parse the following phrase, > > "I love Mathematica" > > then the first trigraph is "I l" (spaces count), the second is " lo", > and the third is "lov". > > One must define an "alphabet" with a sorting order. A good way to do > this is with a string variable like this: > > "abcdefghijklmno..." > > How good is Mathematica at this kind of string manipultion and > searching? > > Mark Evans > evans@gte.net To find frequencies of a small set of given trigraphs you might use StringPosition. In[23]:= str = "I love Mathematica because it has Mathieu functions, matrix operations, and pattern matching."; In[24]:= strL = ToLowerCase[str]; General::spell1: Possible spelling error: new symbol name "strL" is similar to existing symbol "str". In[26]:= Length[StringPosition[strL, "mat"]] Out[26]= 5 To check frequencies of all triads that occur in your string you first might form the triads explicitly, as below. triads = Union[Table[StringTake[strL, {j,j+2}], {j,StringLength[strL]-2 }]]; Then you could do In[51]:= Map[Length[StringPosition[strL,#]]&, triads] Out[51]= {1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 4, 5, 1, > 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, > 1, 1, 1, 1, 1, 1, 1} If you are working with large strings and many triads, a more efficient method might be to initialize a set of function values, one entry per triad, to zeroes. For example, In[54]:= Do[freq[triads[[j]]] = 0, {j,Length[triads]}] In[56]:= ?freq Global`freq freq[", a"] = 0 freq["a b"] = 0 ... Then iterate over the string, and for each triad you find increment the appropriate function value. Takes a bit of coding (not too much) but should be reasonably fast. Daniel Lichtblau Wolfram Research
- Follow-Ups:
- Re: Re: String patterns
- From: Mark Evans <evans.nospam@gte.net>
- Re: Re: String patterns