tricky StringReplace problem with constraints
- To: mathgroup at smc.vnet.net
- Subject: [mg81842] tricky StringReplace problem with constraints
- From: markus.roellig at googlemail.com
- Date: Thu, 4 Oct 2007 04:35:06 -0400 (EDT)
Hi everybody, I am trying to find an effective (and possibly elegant) way to solve the following problem: I have a large list of chemical gas-phase reactions, with their corresponding langevin reaction rates. A typical reaction of this list looks like: H + CH --> C + H2 The whole list only incorporates the main isotopes, i.e. C=12C, O=16O,... I want to take each single reaction and create all possible isotopomere combinations. For the moment I only need the substitutions H->D C->13C O->18O but, for the future different ones might be necessary. Hence for the above reaction, the solution would be: H + CH --> C + H2 D + CH --> C + HD H + CD --> C + HD H + 13CH --> 13C + H2 D + 13CH --> 13C + HD H + 13CD --> 13C + HD The problem, of course, is that I need to conserve particle numbers, i.e. if I substitute a Deuterium on the left side, I need to introduce on on the right hand side as well. Other constraints would be e.g.: DD = D2 HD = DH The full set of all(also nonsense) combinations would be: Distribute[{{"H", "D"}, {"CH", "CD", "13CH", "13CD"}, {"C", "D"}, {"H2", "HD", "DH", "D2"}}, List] After some time I came up with a solution for single substitutions, i.e. I only substitute one isotope of each type. Ideally I would also like to be able to create, e.g. multiply deuterated species like NH3, NH2D, NHD2, ND3. Example reactions from the data file I used are: H CH C H2 2.70e-11 0=2E38 0.0C 300 2000BHG93 H2+ C2H4 C2H3+ H2 H 1.81e-09 0=2E00 0.0M 1041000A7503 H3+ CH3OCH3 C2H6OH+ H2 2.00e-09 0=2E00 0.0L 1041000CMH91 H+ H H2+ PHOTON 5.13e-19 1=2E85 0.0L 200 4000A CO PHOTON O C 2.00e-10 0=2E00 2.5L 1041000DVD88 H CRP H+ e- 5.98e-18 0=2E00 0.0L 1041000C C CRPHOT C+ e- 1.30e-17 0=2E00 510.0L 1041000CGL87 The columns are: reactant1, reactant2, product1,2,3,4, rate1, rate2, rate3, comment. in CSV format it looks like: # FORMAT: # ix,type,reactant1,reactant2,reactant3,product1,product2,product3,product4,a= lpha,beta,gamma,clem,tmin,tmax,accuracy,source # 1,NN,H,CH,,C,H2,,,1.31E-10,0.00,80.0,C,300,2000,B,NIST 2,NN,H,CH2,,CH,H2,,,6.64E-11,0.00,0.0,L,300,2500,A,NIST 3,NN,H,NH,,N,H2,,,1.73E-11,0.50,2400.0,L,80,300,C, 4,NN,H,CH3,,CH2,H2,,,1.00E-10,0.00,7600.0,L,300,2500,A,NIST 5,NN,H,NH2,,NH,H2,,,5.25E-12,0.79,2200.0,L,73,300,C, The problem is, my solution is very time consuming and is not feasible for multiple isotope cases. You can find my first solution at: http://hera.ph1.uni-koeln.de/~roellig/tmp/umist/UMIST99.zip (This used a different data file format (not the csv list)). Because it is very slow and somewhat bulky programmed I started again and came up with something like this: -------------------------------------------------------------------- elements = {"H", "He", "C", "O", "S", "Si", "N", "P", "Na", "Mg", "Fe", "Cl", "e"}; photons = {"CRPHOT", "CRP", "PHOTON"}; carbons = {"C", "C2", "C3", "C4", "C5", "C6", "C7", "C8", "C9"}; hydrogens = {"H", "H2", "H3", "H4", "H5", "H6", "H7", "D", "D2", "D3"}; oxygens = {"O", "O2"}; nitrogens = {"N", "N2"}; sulphurs = {"S", "S2"}; chemicalContent = {RotateLeft[hydrogens], "He", RotateLeft[carbons], RotateLeft[oxygens], RotateLeft[sulphurs], "Si", RotateLeft[nitrogens], "P", "Na", "Mg", "Fe", "Cl", "e"}; ruleH = {"H2" -> "H1D1", "H3" -> "H2D1", "H4" -> "H3D1", "H5" -> "H4D1", "H6" -> "H5D1", "H7" -> "H6D1", "H1" -> "D1"}; ruleC = {"C2" -> "C1X1", "C3" -> "C2X1", "C4" -> "C3X1", "C5" -> "C4X1", "C6" -> "C5X1", "C7" -> "C6X1", "C8" -> "C7X" 1, "C9" -> "C8X1", "C1" -> "X1"}; (*X=^13C*) ruleO = {"O2" -> "O1Y1", "O1" -> "Y1"}; (*Y=^18O*) enumerateRule = {x_String /; ! StringMatchQ[x, __ ~~ NumberString ~~ EndOfString] :> x ~~ "1", x_String /; StringMatchQ[x, __ ~~ NumberString ~~ EndOfString] :> x} {x_String /; ! StringMatchQ[x, __ ~~ NumberString ~~ EndOfString] :> x ~~ "1", x_String /; StringMatchQ[x, __ ~~ NumberString ~~ EndOfString] :> x} denumerateRule = {x_String /; StringMatchQ[x, __ ~~ "1" ~~ EndOfString] :> StringDrop[x, -1]} {x_String /; StringMatchQ[x, __ ~~ "1" ~~ EndOfString] :> StringDrop[x, -1]} insertOne[x_String] := StringJoin @@ (Evaluate[ StringCases[x, RegularExpression[ "(He|H|C|O|S|Si|N|P|Na|Mg|Fe|Cl|e)(\d)?"]]] /. enumerateRule) deuterate[x_String] := StringReplace[StringReplaceList[insertOne[x], ruleH], "1" -> ""] carbonize[x_String] := StringReplace[ StringReplace[StringReplaceList[insertOne[x], ruleC], "1" -> ""], "X" -> "13C"] oxygenize[x_String] := StringReplace[ StringReplace[StringReplaceList[insertOne[x], ruleO], "1" -> ""], "Y" -> "18O"] isIon[x_String] := StringMatchQ[x, __ ~~ RegularExpression["[+-]"] ~~ EndOfString] charge[x_String] := If[isIon[x], StringTake[x, -1], ""] chargeList[x_List] := Thread[ion[x]] This does he following: insertOne["CH3CHO+"] "C1H3C1H1O1" In[24]:= deuterate["CH3CHO+"] carbonize["CH3CHO+"] oxygenize["CH3CHO+"] Out[24]= {"CH2DCHO", "CH3CDO"} Out[25]= {"13CH3CHO", "CH313CHO"} Out[26]= {"CH3CH18O"} In linear molecule, unfortunately, the position of the isotope matters: In[27]:= carbonize["CCCCCCO"] Out[27]= {"13CCCCCCO", "C13CCCCCO", "CC13CCCCO", "CCC13CCCO", \ "CCCC13CCO", "CCCCC13CO"} But, I am still missing the charge: In[29]:= deuteratePlus[y_] := StringReplace[deuterate[y], x_ ~~ EndOfString :> x <> charge[y]] oxygenizePlus[y_] := StringReplace[oxygenize[y], x_ ~~ EndOfString :> x <> charge[y]] carbonizePlus[y_] := StringReplace[carbonize[y], x_ ~~ EndOfString :> x <> charge[y]] In[32]:= deuteratePlus["CH3CHO+"] carbonizePlus["CH3CHO+"] oxygenizePlus["CH3CHO+"] Out[32]= {"CH2DCHO+", "CH3CDO+"} Out[33]= {"13CH3CHO+", "CH313CHO+"} Out[34]= {"CH3CH18O+"} Now to the main part: rule[fkt_][ 0] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_, k__} :> {n, t, a, b, c, d, e, f, g, h, i, j, k} rule[fkt_][ 1] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_, k__} :> {n, t, fkt[a], b, c, fkt[d], e, f, g, h, i, j, k} /; fkt[a] != {} && fkt[d] != {} && ! MemberQ[photons, a] && ! MemberQ[photons, d]; rule[fkt_][ 2] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_, k__} :> {n, t, fkt[a], b, c, d, fkt[e], f, g, h, i, j, k} /; fkt[a] != {} && fkt[e] != {} && ! MemberQ[photons, a] && ! MemberQ[photons, e]; rule[fkt_][ 3] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_, k__} :> {n, t, fkt[a], b, c, d, e, fkt[f], g, h, i, j, k} /; fkt[a] != {} && fkt[f] != {} && ! MemberQ[photons, a] && ! MemberQ[photons, f]; rule[fkt_][ 4] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_, k__} :> {n, t, fkt[a], b, c, d, e, f, fkt[g], h, i, j, k} /; fkt[a] != {} && fkt[g] != {} && ! MemberQ[photons, a] && ! MemberQ[photons, g]; rule[fkt_][ 5] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_, k__} :> {n, t, a, fkt[b], c, fkt[d], e, f, g, h, i, j, k} /; fkt[b] != {} && fkt[d] != {} && ! MemberQ[photons, b] && ! MemberQ[photons, d]; rule[fkt_][ 6] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_, k__} :> {n, t, a, fkt[b], c, d, fkt[e], f, g, h, i, j, k} /; fkt[b] != {} && fkt[e] != {} && ! MemberQ[photons, b] && ! MemberQ[photons, e]; rule[fkt_][ 7] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_, k__} :> {n, t, a, fkt[b], c, d, e, fkt[f], g, h, i, j, k} /; fkt[b] != {} && fkt[f] != {} && ! MemberQ[photons, b] && ! MemberQ[photons, f]; rule[fkt_][ 8] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_, k__} :> {n, t, a, fkt[b], c, d, e, f, fkt[g], h, i, j, k} /; fkt[b] != {} && fkt[g] != {} && ! MemberQ[photons, b] && ! MemberQ[photons, g]; rule[fkt_][ 9] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_, k__} :> {n, t, a, b, fkt[c], fkt[d], e, f, g, h, i, j, k} /; fkt[c] != {} && fkt[d] != {} && ! MemberQ[photons, c] && ! MemberQ[photons, d]; rule[fkt_][ 10] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_, k__} :> {n, t, a, b, fkt[c], d, fkt[e], f, g, h, i, j, k} /; fkt[c] != {} && fkt[e] != {} && ! MemberQ[photons, c] && ! MemberQ[photons, e]; rule[fkt_][ 11] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_, k__} :> {n, t, a, b, fkt[c], d, e, fkt[f], g, h, i, j, k} /; fkt[c] != {} && fkt[f] != {} && ! MemberQ[photons, c] && ! MemberQ[photons, f]; rule[fkt_][ 12] := {n_, t_, a_, b_, c_, d_, e_, f_, g_, h_, i_, j_, k__} :> {n, t, a, b, fkt[c], d, e, f, fkt[g], h, i, j, k} /; fkt[c] != {} && fkt[g] != {} && ! MemberQ[photons, c] && ! MemberQ[photons, g]; In[195]:= combi = Flatten[ Union[Table[{# /. rule[carbonizePlus][i], # /. rule[oxygenizePlus][i], # /. rule[deuteratePlus][i]} &[ data[[1]]], {i, 0, 12}]], 1]; Reverse[Union[Flatten[Distribute[#, List] & /@ combi, 1]]][[All, 3 ;; 9]] // TableForm { {"H", "CH", "", "C", "H2"}, {"H", "CD", "", "C", "HD"}, {"H", "13CH", "", "13C", "H2"} {"D", "CH", "", "C", "HD"} } In[132]:= combi = Flatten[ Union[Table[{# /. rule[carbonizePlus][i], # /. rule[oxygenizePlus][i], # /. rule[deuteratePlus][i]} &[ data[[1]]], {i, 0, 12}]], 1]; Reverse[Union[Flatten[Distribute[#, List] & /@ combi, 1]]] I am inserting a deuteron, one 13C, and one 18O. Out[133]= {{1, "NN", "H", "CH", "", "C", "H2", "", "", 1.31*10^-10, 0., 80., "C", 300, 2000, "B", "NIST"}, {1, "NN", "H", "CD", "", "C", "HD", "", "", 1.31*10^-10, 0., 80., "C", 300, 2000, "B", "NIST"}, {1, "NN", "H", "13CH", "", "13C", "H2", "", "", 1.31*10^-10, 0., 80., "C", 300, 2000, "B", "NIST"}, {1, "NN", "D", "CH", "", "C", "HD", "", "", 1.31*10^-10, 0., 80., "C", 300, 2000, "B", "NIST"}} for better readability: { {"O", "C2H4", "", "H2CCO", "H2"}, {"O", "C2H3D", "", "HDCCO", "H2"}, {"O", "C2H3D", "", "H2CCO", "HD"} {"O", "C13CH4", "", "H2C13CO", "H2"}, {"O", "C13CH4", "", "H213CCO", "H2"}, {"18O", "C2H4", "", "H2CC18O", "H2"} } ---------------------------------------- So far I am able to insert one isotope into the reaction formula. What changes would be necessary to implement the ability to apply the same substitution twice, or thrice, e.g. to account for multiply deuterated species like ND3 ? So far my rules do not recognize the isotopes and I can't produce 13C18O. How could this be implemented? Is my overall approach ok in terms of speed and flexibility? What other approaches can you think of? I really appreciaty any comments and input. Thanks in advance. Markus R=F6llig Argelander Inst. f=FCr Astronomie, Univ. Bonn, Germany