Converting complex docs to decent HTML ain’t easy.


It’s well known (to anybody who has tried it) that Microsoft Word’s “save as HTML” or “save as web page” command generates a crap-ton of stuff that is intended to make the file act as a stand-alone, editable document that is identical when re-opened. And converting these to decent HTML is an odious chore, not easily done.
Especially if there are images.I have a huge amount of Word files, with Chemdraw images in them, that I’m trying to convert to web pages.
One link I found, at this blog, suggested that viewing the document in Gmail’s previewer might help. Now, since that blog post is over three years old, I am not surprised that things have changed.
In fact, the HTML of a Google Doc is just as horrible.

Saving Word to Rich Text isn’t much better. You lose all the images.

One of the first tasks I’m going to learn is a workflow to convert all my class worksheets to web pages. This is a pain in the patoot, but it’s gotta happen somehow, sometime. I’d like to be able to keep the images, at least some of them. Others may get replaced with interactive bits later, but just getting the basic stuff into HTML is Job 1.

A program called RapidWeaver allows simple editing and I can drag images into it from Word. But the code is just as icky.
There are word to RTF to HTML converters out there, and they sacrifice images. I have a few programs like TextWrangler too, which can do some.

But it’s going to take a bit of work just to figure out a decent flow.

Why can’t those dips at MS just save your document with a style sheet? Grr….
Continue reading