MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Set of strings reducing problem (Again!)

  • To: mathgroup at smc.vnet.net
  • Subject: [mg60339] Set of strings reducing problem (Again!)
  • From: "Edson Ferreira" <edsferr at uol.com.br>
  • Date: Tue, 13 Sep 2005 06:07:43 -0400 (EDT)
  • Sender: owner-wri-mathgroup at wolfram.com

Dear Math Gurus,

Sometime ago I have posted a problem here and got some very clever 
solutions which I would like to thank.

Currently I'm working with one solution I received, but It didn't make 
the "complete" job. The strange behaviour is that for some cases it 
works the way I want.

Let's post the problem (in italic) again (hope you remember):

Let's say a have a list of "n" equal length strings:

L={"11111111",
   "11112111",
   "1111X111",
       ...
       ...
       ...
   "21122211"}

The characters used in strings are only "1", "X", "2", "U", "M", "D" and 

"T".

What I want is a reduced set of strings (with all the resulting strings
with the same length as all the original ones).

The rule to "join" two strings is the following:

If one string is different from the other by just one character then
take the characters that are different and apply the rule bellow:

"1" + "X" = "D"
"1" + "2" = "M"
"1" + "U" = "T"
"X" + "2" = "U"
"X" + "M" = "T"
"2" + "D" = "T"

For example, suppose I have these two elements in the list : "11112111"
and "1111X111"

The rule will transform these two strings into one : "1111U111"

After all the possible transformations (always using two strings with
only one different character and resulting another string) I will obtain 
a reduced set of strings.

How can I do that with mathematica??

I guess the first step is a function to identify is two strings are
different by just one ccharacter.
A loop then search in the set for any ocurrences of that and apply all
possible transformations until we can't get any redution.

For instance : {"T11TTTTT"} would be generated from a set with 729 
unique strings !!!!!

Some Observations:

- The strings are all unique.
- The different character, can ocurr at any position.
- The length of all strings is the same.
-All I want is a reduced set of string, thus when you compare two 
strings and you find they differ by only one character, you ELIMINATE 
the original ones and ADD the NEW one to the original set.
-The rules I gave are the only ones.
-It's impossible to encounter something like "1111M111" + "11112111"
-The two strings that will be "joined" have only one different character 
at same position!
-You can't join "21111111" vs. "11111112" (characters at different 
positions)

The clever solution I received for a set L is the following:

L = {"111", "11X", "112",
    "1X1", "1XX", "1X2", "121",
    "12X", "122"};

cl = Characters /@ L;

r = Dispatch[{"1" + "X" -> "D",
     "1" + "2" -> "M",
     "1" + "U" -> "T",
     "X" + "2" -> "U",
     "X" + "M" -> "T",
     "2" + "D" -> "T"}];

ncl = StringJoin @@@(
   cl //. {x___List,
      {a___, p_String, c___},
      {a___, q_String, c___},
      y___List} :>
     {x, {a, p + q /. r, c},
       y} /; StringQ[p + q /.
        r])

Which evaluates {"1TT"}

Now finally I can explain the problem (see the code bellow):

In[1]:=
Unprotect[D];
In[2]:=
U={"2","X"};
In[3]:=
M={"1","2"};
In[4]:=
D={"1","X"};
In[5]:=
T={"1","2","X"};
In[6]:=
L=Flatten[Outer[StringJoin,T,T,T,D]];
In[7]:=
L = Select[L, Count[Characters[#], "1"] > 1 &];
In[8]:=
cl = Characters /@ L;
In[9]:=
r = Dispatch[{"1" + "X" -> "D",
"1" + "2" -> "M",
"1" + "U" -> "T",
"X" + "2" -> "U",
"X" + "M" -> "T",
"2" + "D" -> "T"}];
In[10]:=
ncl = StringJoin @@@ (
cl //. {x___List,
{a___, p_String, c___},
{a___, q_String, c___},
y___List} :>
{x, {a, p + q /. r, c},
y} /; StringQ[p + q /.
r])
Out[10]=
{11TD, 121D, 12U1, 1X1D, 1XU1, 211D, 21U1, 2U11, X11D, X1U1, XU11}


If look at some pairs of positions in the result you see that some 
combinations were not done: 2+4, 3+5, 6+9, 7+10 and 8+11

The correct result is: {11TD,1U1D,1UU1,U11D,U1U1,UU11}

Now, finally, the question:

Why this code does not perform all possible combinations???

It looks, as Stephen Layland once said, that this code will only find 
sets of strings that differ by one character that are
right next to each other...

Any solution????

Sorry for the looooooooooong history...

Thanks!!!!!!!!!

Edson



  • Prev by Date: Re: Why this function does not return a single value
  • Next by Date: Re: High precision NIntegrate problem. Please help!
  • Previous by thread: Re: Why am I getting this error?
  • Next by thread: Re: Set of strings reducing problem (Again!)