MathGroup Archive 2004

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Mathematica to Word

  • To: mathgroup at smc.vnet.net
  • Subject: [mg46774] Re: Mathematica to Word
  • From: "Hans Michel" <hansjm at bellsouth.net>
  • Date: Sun, 7 Mar 2004 01:33:57 -0500 (EST)
  • References: <c241pc$2h8$1@smc.vnet.net> <c26g4i$e12$1@smc.vnet.net>
  • Reply-to: "Hans Michel" <hansjm at bellsouth.net>
  • Sender: owner-wri-mathgroup at wolfram.com

Hi,

First I will assume that you have Microsoft Word >2003 .doc files from OS
other than Linux.
If this is the case and you need a solution that involves not having to get
Microsoft Development Kits to write your own C++ program that can read the
binary format that Word is saved as, then do the following.

If you don't have a vesion of OpenOffice, then get the free download of
OpenOffice 1.1.0 from

http://download.openoffice.org/1.1.0/index.html

I don't know if there is a Czech version.

In any case this is free. ~74 MB.

After you download and install OpenOffice on Linux. Run the OpenOffice
Writer.

1. Open your Word Documents and save as the OpenOffice .sxw file format.

2. from command line unzip filenamehere.sxw
(I used unzip   ....)

3. There should be an extracted or decompressed file called content.xml
(This is the file you need)

4. Run Mathematica and execute the following code
(except please change
Import["C:\\Documents and Settings\\hans\\Desktop\\doc\\content.xml", ...
To the appropriate file path and format for your OS. I did this in Windows
because my complementary copy of Mathematica for Linux has expired)

 XMLNote[XMLElement[tag_, attributes_,
      data_], m_Integer] := Cell[CellGroupData[{
        Cell[TextData[
            StyleBox[tag, FontFamily -> "Swiss",
              FontWeight -> "Bold", FontSize -> 15]]],
        Sequence @@ (XMLNote[#1, m] &) /@ attributes,
        Sequence @@ (XMLNote[#1, m + 30] &) /@ data
        }, Open],
    CellMargins -> {{m, Inherited}, {Inherited, Inherited}}]

XMLNote[{an_String, a_String} -> v_String, m_Integer] :=
  Cell[TextData[
      StyleBox[an, FontColor -> Hue[0.6]], " ",
      StyleBox[a, FontWeight -> "Bold"], " = ",
      StyleBox[v, Background -> GrayLevel[0.8]]],
    CellMargins -> {{m + 5, Inherited}, {Inherited, Inherited}}]

XMLNote[a_String -> v_String, m_Integer] :=
  Cell[TextData[
      StyleBox[a, FontWeight -> "Bold"], " = ",
      StyleBox[v, Background -> GrayLevel[0.8]]],
    CellMargins -> {{m + 5, Inherited}, {Inherited, Inherited}}]

XMLNote[s_String, m_Integer] :=
  Cell[s, Background -> GrayLevel[0.9],
    CellMargins -> {{m + 25, Inherited}, {Inherited, Inherited}}]

doc = Import["C:\\Documents and Settings\\hans\\Desktop\\doc\\content.xml",
"XML",ConversionOptions -> {"ReadDTD" -> False, "AllowRemoteDTDAccess" ->
False, "ValidateAgainstDTD" -> False, "IncludeNamespaces" -> False}]

(* put semi-colon if you don't wish to see the output*)

NotebookPut@
  Notebook[{XMLNote[doc[[2]], 0]}, CellGrouping -> "Manual"]

5. Play around with the code. The XMLNote code is straight from the help
files.

I have not tested this with a Word file with any of entities, graphics. I
suspect that you would have more entity problems if your files are in
European langauge (accent, etc) at that point I would include the
ConversionOption
"AllowUnrecognizedEntities" -> True

If this fails for you the first try with a converted Word doc then in
OpenOffice Writer create a new file type stuff in and save, then try these
steps again with your simple *.sxw file (content.xml).

Good Luck


"Pavel Pokorny" <Pavel.Pokorny at vscht.REMOVEME.cz> wrote in message
news:c26g4i$e12$1 at smc.vnet.net...
> stevebg at adelphia.net wrote:
> >         This must have been asked 1000 times, but I haven't seen the
> > answer. I have a large but ordinary-looking file of Mathematica text
output on
> > the screen with no special characters, unusual formatting, lines, or
> > colors, Just plain text. I want to get this data, just as it looks,
> > into MS Word for further processing. When I convert the Mathematica
output to
> > text or do anything else I can think of, I get a very complex format
> > in Word, which would take a complicated macro to undo. There must be a
> > simple way to get real, plain text with no mysterious codes, etc., in
> > the Mathematica output file.
> >         Thanks for any help.
>
> > Steve Gray
>
> And I would like to append a similar question:
>
> How can I read a .doc file (generated in MS Word)
> using Mathematica 5.0 for Linux?
>
> Thanks for any help
>
> -- 
> Pavel Pokorny
> Math Dept, Prague Institute of Chemical Technology
> http://www.vscht.cz/mat/Pavel.Pokorny
>


  • Prev by Date: Symbolic computation with vector fields and tensors
  • Next by Date: Substitution
  • Previous by thread: Re: Mathematica to Word
  • Next by thread: Re: Mathematica to Word