tricky StringReplace problem with constraints

• To: mathgroup at smc.vnet.net
• Subject: [mg81842] tricky StringReplace problem with constraints
• Date: Thu, 4 Oct 2007 04:35:06 -0400 (EDT)

```Hi everybody,

I am trying to find an effective (and possibly elegant) way to solve
the following problem:

I have a large list of chemical gas-phase reactions, with their
corresponding
langevin reaction rates. A typical reaction of this list looks like:

H   +   CH   -->   C  +   H2

The whole list only incorporates the main isotopes, i.e. C=12C,
O=16O,...

I want to take each single reaction and create all possible
isotopomere combinations. For the moment I only need the substitutions
H->D
C->13C
O->18O
but, for the future different ones might be necessary.

Hence for the above reaction, the solution would be:

H     +       CH            -->            C      +      H2
D     +       CH            -->            C      +      HD
H     +       CD            -->            C      +      HD
H     +       13CH          -->            13C    +      H2
D     +       13CH          -->            13C    +      HD
H     +       13CD          -->            13C    +      HD

The problem, of course, is that I need to conserve particle
numbers, i.e. if I substitute a Deuterium on the left side,
I need to introduce on on the right hand side as well. Other
constraints would be e.g.:

DD = D2
HD = DH

The full set of all(also nonsense) combinations would be:
Distribute[{{"H", "D"}, {"CH", "CD", "13CH", "13CD"}, {"C",
"D"}, {"H2", "HD", "DH", "D2"}}, List]

After some time I came up with a solution for single
substitutions, i.e. I only substitute one isotope of each type.
Ideally I would also like to be able to create, e.g. multiply
deuterated
species like
NH3, NH2D, NHD2, ND3.

Example reactions from the data file I used are:

H        CH                C        H2                 2.70e-11
0=2E38       0.0C  300 2000BHG93
H2+      C2H4              C2H3+    H2       H         1.81e-09
0=2E00       0.0M   1041000A7503
H3+      CH3OCH3           C2H6OH+  H2                 2.00e-09
0=2E00       0.0L   1041000CMH91
H+       H                 H2+      PHOTON             5.13e-19
1=2E85       0.0L  200 4000A
CO       PHOTON            O        C                  2.00e-10
0=2E00       2.5L   1041000DVD88
H        CRP               H+       e-                 5.98e-18
0=2E00       0.0L   1041000C
C        CRPHOT            C+       e-                 1.30e-17
0=2E00     510.0L   1041000CGL87

The columns are: reactant1, reactant2, product1,2,3,4, rate1, rate2,
rate3, comment.

in CSV format it looks like:

# FORMAT:
#
ix,type,reactant1,reactant2,reactant3,product1,product2,product3,product4,a=
lpha,beta,gamma,clem,tmin,tmax,accuracy,source
#
1,NN,H,CH,,C,H2,,,1.31E-10,0.00,80.0,C,300,2000,B,NIST
2,NN,H,CH2,,CH,H2,,,6.64E-11,0.00,0.0,L,300,2500,A,NIST
3,NN,H,NH,,N,H2,,,1.73E-11,0.50,2400.0,L,80,300,C,
4,NN,H,CH3,,CH2,H2,,,1.00E-10,0.00,7600.0,L,300,2500,A,NIST
5,NN,H,NH2,,NH,H2,,,5.25E-12,0.79,2200.0,L,73,300,C,

The problem is, my solution is very time consuming and
is not feasible for multiple isotope cases. You can find my
first solution at:
http://hera.ph1.uni-koeln.de/~roellig/tmp/umist/UMIST99.zip
(This used a different data file format (not the csv list)).
Because it is very slow and somewhat bulky programmed I started
again and came up with something like this:

--------------------------------------------------------------------

elements = {"H", "He", "C", "O", "S", "Si", "N", "P", "Na", "Mg",
"Fe", "Cl", "e"};
photons = {"CRPHOT", "CRP", "PHOTON"};
carbons = {"C", "C2", "C3", "C4", "C5", "C6", "C7", "C8", "C9"};
hydrogens = {"H", "H2", "H3", "H4", "H5", "H6", "H7", "D", "D2",
"D3"};
oxygens = {"O", "O2"};
nitrogens = {"N", "N2"};
sulphurs = {"S", "S2"};

chemicalContent = {RotateLeft[hydrogens], "He", RotateLeft[carbons],
RotateLeft[oxygens], RotateLeft[sulphurs], "Si",
RotateLeft[nitrogens], "P", "Na", "Mg", "Fe", "Cl", "e"};
ruleH = {"H2" -> "H1D1", "H3" -> "H2D1", "H4" -> "H3D1",
"H5" -> "H4D1", "H6" -> "H5D1", "H7" -> "H6D1", "H1" -> "D1"};
ruleC = {"C2" -> "C1X1", "C3" -> "C2X1", "C4" -> "C3X1",
"C5" -> "C4X1", "C6" -> "C5X1", "C7" -> "C6X1", "C8" -> "C7X" 1,
"C9" -> "C8X1", "C1" -> "X1"}; (*X=^13C*)
ruleO = {"O2" -> "O1Y1", "O1" -> "Y1"}; (*Y=^18O*)
enumerateRule = {x_String /; !
StringMatchQ[x, __ ~~ NumberString ~~ EndOfString] :> x ~~ "1",
x_String /; StringMatchQ[x, __ ~~ NumberString ~~ EndOfString] :>
x}
{x_String /; ! StringMatchQ[x, __ ~~ NumberString ~~ EndOfString] :>
x ~~ "1",
x_String /; StringMatchQ[x, __ ~~ NumberString ~~ EndOfString] :> x}
denumerateRule = {x_String /;
StringMatchQ[x, __ ~~ "1" ~~ EndOfString] :> StringDrop[x, -1]}
{x_String /; StringMatchQ[x, __ ~~ "1" ~~ EndOfString] :>
StringDrop[x, -1]}
insertOne[x_String] :=
StringJoin @@ (Evaluate[
StringCases[x,
RegularExpression[
"(He|H|C|O|S|Si|N|P|Na|Mg|Fe|Cl|e)(\d)?"]]] /. enumerateRule)
deuterate[x_String] :=
StringReplace[StringReplaceList[insertOne[x], ruleH], "1" -> ""]
carbonize[x_String] :=
StringReplace[
StringReplace[StringReplaceList[insertOne[x], ruleC], "1" -> ""],
"X" -> "13C"]
oxygenize[x_String] :=
StringReplace[
StringReplace[StringReplaceList[insertOne[x], ruleO], "1" -> ""],
"Y" -> "18O"]

isIon[x_String] :=
StringMatchQ[x, __ ~~ RegularExpression["[+-]"] ~~ EndOfString]
charge[x_String] := If[isIon[x], StringTake[x, -1], ""]

This does he following:

insertOne["CH3CHO+"]

"C1H3C1H1O1"

In[24]:= deuterate["CH3CHO+"]
carbonize["CH3CHO+"]
oxygenize["CH3CHO+"]

Out[24]= {"CH2DCHO", "CH3CDO"}

Out[25]= {"13CH3CHO", "CH313CHO"}

Out[26]= {"CH3CH18O"}

In linear molecule, unfortunately, the position of the isotope
matters:

In[27]:= carbonize["CCCCCCO"]

Out[27]= {"13CCCCCCO", "C13CCCCCO", "CC13CCCCO", "CCC13CCCO", \
"CCCC13CCO", "CCCCC13CO"}

But, I am still missing the charge:

In[29]:=
deuteratePlus[y_] :=
StringReplace[deuterate[y], x_ ~~ EndOfString :> x <> charge[y]]
oxygenizePlus[y_] :=
StringReplace[oxygenize[y], x_ ~~ EndOfString :> x <> charge[y]]
carbonizePlus[y_] :=
StringReplace[carbonize[y], x_ ~~ EndOfString :> x <> charge[y]]

In[32]:= deuteratePlus["CH3CHO+"]
carbonizePlus["CH3CHO+"]
oxygenizePlus["CH3CHO+"]

Out[32]= {"CH2DCHO+", "CH3CDO+"}

Out[33]= {"13CH3CHO+", "CH313CHO+"}

Out[34]= {"CH3CH18O+"}

Now to the main part:

rule[fkt_][
0] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_, k__} :> {n,
t, a, b, c, d, e, f, g, h, i, j, k}
rule[fkt_][
1] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_,
k__} :> {n, t, fkt[a], b, c, fkt[d], e, f, g, h, i, j, k} /;
fkt[a] != {} &&
fkt[d] != {} && ! MemberQ[photons, a] && ! MemberQ[photons, d];
rule[fkt_][
2] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_,
k__} :> {n, t, fkt[a], b, c, d, fkt[e], f, g, h, i, j, k} /;
fkt[a] != {} &&
fkt[e] != {} && ! MemberQ[photons, a] && ! MemberQ[photons, e];
rule[fkt_][
3] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_,
k__} :> {n, t, fkt[a], b, c, d, e, fkt[f], g, h, i, j, k} /;
fkt[a] != {} &&
fkt[f] != {} && ! MemberQ[photons, a] && ! MemberQ[photons, f];
rule[fkt_][
4] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_,
k__} :> {n, t, fkt[a], b, c, d, e, f, fkt[g], h, i, j, k} /;
fkt[a] != {} &&
fkt[g] != {} && ! MemberQ[photons, a] && ! MemberQ[photons, g];
rule[fkt_][
5] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_,
k__} :> {n, t, a, fkt[b], c, fkt[d], e, f, g, h, i, j, k} /;
fkt[b] != {} &&
fkt[d] != {} && ! MemberQ[photons, b] && ! MemberQ[photons, d];
rule[fkt_][
6] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_,
k__} :> {n, t, a, fkt[b], c, d, fkt[e], f, g, h, i, j, k} /;
fkt[b] != {} &&
fkt[e] != {} && ! MemberQ[photons, b] && ! MemberQ[photons, e];
rule[fkt_][
7] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_,
k__} :> {n, t, a, fkt[b], c, d, e, fkt[f], g, h, i, j, k} /;
fkt[b] != {} &&
fkt[f] != {} && ! MemberQ[photons, b] && ! MemberQ[photons, f];
rule[fkt_][
8] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_,
k__} :> {n, t, a, fkt[b], c, d, e, f, fkt[g], h, i, j, k} /;
fkt[b] != {} &&
fkt[g] != {} && ! MemberQ[photons, b] && ! MemberQ[photons, g];
rule[fkt_][
9] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_,
k__} :> {n, t, a, b, fkt[c], fkt[d], e, f, g, h, i, j, k} /;
fkt[c] != {} &&
fkt[d] != {} && ! MemberQ[photons, c] && ! MemberQ[photons, d];
rule[fkt_][
10] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_,
k__} :> {n, t, a, b, fkt[c], d, fkt[e], f, g, h, i, j, k} /;
fkt[c] != {} &&
fkt[e] != {} && ! MemberQ[photons, c] && ! MemberQ[photons, e];
rule[fkt_][
11] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_,
k__} :> {n, t, a, b, fkt[c], d, e, fkt[f], g, h, i, j, k} /;
fkt[c] != {} &&
fkt[f] != {} && ! MemberQ[photons, c] && ! MemberQ[photons, f];
rule[fkt_][
12] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_,
k__} :> {n, t, a, b, fkt[c], d, e, f, fkt[g], h, i, j, k} /;
fkt[c] != {} &&
fkt[g] != {} && ! MemberQ[photons, c] && ! MemberQ[photons, g];

In[195]:=
combi = Flatten[
Union[Table[{# /. rule[carbonizePlus][i], # /.
rule[oxygenizePlus][i], # /. rule[deuteratePlus][i]} &[
data[[1]]], {i, 0, 12}]], 1];
Reverse[Union[Flatten[Distribute[#, List] & /@ combi, 1]]][[All,
3 ;; 9]] // TableForm

{
{"H", "CH", "", "C", "H2"},
{"H", "CD", "", "C", "HD"},
{"H", "13CH", "", "13C", "H2"}
{"D", "CH", "", "C", "HD"}
}

In[132]:=
combi = Flatten[
Union[Table[{# /. rule[carbonizePlus][i], # /.
rule[oxygenizePlus][i], # /. rule[deuteratePlus][i]} &[
data[[1]]], {i, 0, 12}]], 1];
Reverse[Union[Flatten[Distribute[#, List] & /@ combi, 1]]]

I am inserting a deuteron, one 13C, and one 18O.

Out[133]= {{1, "NN", "H", "CH", "", "C", "H2", "", "", 1.31*10^-10,
0., 80., "C", 300, 2000, "B", "NIST"}, {1, "NN", "H", "CD", "", "C",
"HD", "", "", 1.31*10^-10, 0., 80., "C", 300, 2000, "B",
"NIST"}, {1, "NN", "H", "13CH", "", "13C", "H2", "", "",
1.31*10^-10, 0., 80., "C", 300, 2000, "B", "NIST"}, {1, "NN", "D",
"CH", "", "C", "HD", "", "", 1.31*10^-10, 0., 80., "C", 300, 2000,
"B", "NIST"}}
{
{"O", "C2H4", "", "H2CCO", "H2"},
{"O", "C2H3D", "", "HDCCO", "H2"},
{"O", "C2H3D", "", "H2CCO", "HD"}
{"O", "C13CH4", "", "H2C13CO", "H2"},
{"O", "C13CH4", "", "H213CCO", "H2"},
{"18O", "C2H4", "", "H2CC18O", "H2"}
}

----------------------------------------

So far I am able to insert one isotope into the reaction formula.
What changes would be necessary to implement the ability to apply
the same substitution twice, or thrice, e.g. to account for multiply
deuterated species like ND3 ?

So far my rules do not recognize the isotopes and I can't produce
13C18O. How could this be implemented?

Is my overall approach ok in terms of speed and flexibility? What
other
approaches can you think of?