Conversion from Word – (mostly) Solved!

Simplest method I’ve been able to find:

  1. Open Word file, select & copy material.
  2. Go to the demo page for CKEditor,
  3. Click the “Paste From Word” button and paste the content.
  4. Click the “Source” button in CKEditor
  5. Copy HTML source code
  6. Paste into an HTML document. I’m using Flux and RapidWeaver for that.

Amazingly, my images seem to have come through as well. There’s no horrible excess of <span> or <div> tags, no font specifications, none of that cruft. so I can add those in later to make it more meaningful. The headings and basic markup are there, nothing else.

Here it is in the “saved-from-word” HTML:

<p class=MsoNormal>&nbsp;</p>
<p class=MsoListParagraph style='text-indent:-.25in'>1.<span style='font:7.0pt "Times New Roman"'>&nbsp;&nbsp;&nbsp;&nbsp;
</span>What bond breaks in the first step? </p>
<p class=MsoListParagraph style='text-indent:-.25in'>2.<span style='font:7.0pt "Times New Roman"'>&nbsp;&nbsp;&nbsp;&nbsp;
</span>In the second step, one bond breaks and two bonds form. Which are they? </p>
<p class=MsoListParagraph style='text-indent:-.25in'>3.<span style='font:7.0pt "Times New Roman"'>&nbsp;&nbsp;&nbsp;&nbsp;
</span>Compare the relative energies of the reactants (t-BuCl + B<sup>-</sup>);
the intermediates (t-Bu<sup>+</sup>, Cl<sup>-</sup>, and B<sup>-</sup>); and
the products (2-methylpropene, Cl<sup>-</sup>, and HB). </p>

And the same material done from the above method:

 <li>What bond breaks in the first step?</li>
 <li>In the second step, one bond breaks and two bonds form. Which are they?</li>
 <li>Compare the relative energies of the reactants (t-BuCl + B-); the intermediates (t-Bu+, Cl-, and B-); and the products (2-methylpropene, Cl-, and HB).</li>

So I lose the superscript tags for the ions, but the list format is done properly with tags and the whole code is much more readable.  It’s a very worthy trade-off.

The images are still saved with very generic names, so I will have to come up with some kind of system for naming them properly and keeping them sorted.
(from Word)

 <p class=MsoNormal><img width=266 height=62 id="_x0000_i1027"

and from CKEdit:

<p><img src="file://localhost/Users/dave/Library/Caches/TemporaryItems/msoclip/0clip_image006.png" style="height:62px; width:266px; " ></p>

This was so easy that I downloaded the ckeditor library, from their website (, made my own local page following their tutorial. Now, the only annoying bit is fixing the images from their location in a temp folder to a meaningful location in the web site directory.

Converting complex docs to decent HTML ain’t easy.

It’s well known (to anybody who has tried it) that Microsoft Word’s “save as HTML” or “save as web page” command generates a crap-ton of stuff that is intended to make the file act as a stand-alone, editable document that is identical when re-opened. And converting these to decent HTML is an odious chore, not easily done.
Especially if there are images.I have a huge amount of Word files, with Chemdraw images in them, that I’m trying to convert to web pages.
One link I found, at this blog, suggested that viewing the document in Gmail’s previewer might help. Now, since that blog post is over three years old, I am not surprised that things have changed.
In fact, the HTML of a Google Doc is just as horrible.

Saving Word to Rich Text isn’t much better. You lose all the images.

One of the first tasks I’m going to learn is a workflow to convert all my class worksheets to web pages. This is a pain in the patoot, but it’s gotta happen somehow, sometime. I’d like to be able to keep the images, at least some of them. Others may get replaced with interactive bits later, but just getting the basic stuff into HTML is Job 1.

A program called RapidWeaver allows simple editing and I can drag images into it from Word. But the code is just as icky.
There are word to RTF to HTML converters out there, and they sacrifice images. I have a few programs like TextWrangler too, which can do some.

But it’s going to take a bit of work just to figure out a decent flow.

Why can’t those dips at MS just save your document with a style sheet? Grr….
Achievement Unlocked!

Here’s the big news I hinted at yesterday! I’m officially approved for my first sabbatical ever. I’ll be working on converting my class notes and worksheets into a coherent whole, adding in reflection and metacognitive questions,  then making them into an e-book with interactive figures and (I hope) even machine-checkable exercises.

To do this I’ll be doing a lot of learning of software tools and web tools, probably integrating ChemDoodle into HTML and maybe iBooks Author. You can read all about my learning curve – and see betas of some of the tech stuff – by following posts tagged Sabbatical.

I’ll have to learn how to take my rudimentary HTML skills from “yay, I can make a link tag” to quite a bit higher. Fortunately my employer has a site license to so I can get plenty of tutorials from there.