Anyone got a preferred program or package for this? I'd like a *good* one, and Word or OO.o's save as html in no way qualifies as other than amateur crap.
So far, with a little googling, I've found the wv package. wvHtml works, but I don't like the output - it insists on <div>, and on &rhquo instead of plain, simple ".
mark "what, ask for an opinion in this shy, diffident group?"
On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:
To: CentOS mailing list centos@centos.org From: m.roth@5-cent.us Subject: [CentOS] converting .doc to html
Anyone got a preferred program or package for this? I'd like a *good* one, and Word or OO.o's save as html in no way qualifies as other than amateur crap.
So far, with a little googling, I've found the wv package. wvHtml works, but I don't like the output - it insists on <div>, and on &rhquo instead of plain, simple ".
I think Abiword can read and write those formats.
[root@karsites ~]# rpm -qv abiword abiword-2.6.6-1.el5.rf
HTH
Keith
----------------------------------------------------------- Websites: http://www.karsites.net http://www.php-debuggers.net http://www.raised-from-the-dead.org.uk
All email addresses are challenge-response protected with TMDA [http://tmda.net] -----------------------------------------------------------
Keith Roberts wrote:
On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:
From: m.roth@5-cent.us
Anyone got a preferred program or package for this? I'd like a *good* one, and Word or OO.o's save as html in no way qualifies as other than amateur crap.
So far, with a little googling, I've found the wv package. wvHtml works, but I don't like the output - it insists on <div>, and on &rhquo instead of plain, simple ".
I think Abiword can read and write those formats.
Given that both Word and OO.o produce such lousy, uselessly cluttered html, I'm a tad loathe to install another wp... and I really just wanted a command line conversion tool.
As a side note, I tried quanta about 6 years ago, and that did lousy things to my html, too (going from edit to display and back, I think it was, unformatted the *whole* document, left justifying all, even when I *told* it to leave formatting...), so I'm not wildly crazed with web editing programs.
As my own personal web page reads, "this page proudly built in vi"....
mark
On Fri, Jun 22, 2012 at 9:40 AM, m.roth@5-cent.us wrote:
Anyone got a preferred program or package for this? I'd like a *good* one, and Word or OO.o's save as html in no way qualifies as other than amateur crap.
So far, with a little googling, I've found the wv package. wvHtml works, but I don't like the output - it insists on <div>, and on &rhquo instead of plain, simple ".
Mail it to yourself on a gmail account, then 'view' the attachment instead of downloading the original. It is still going to have <div>'s though.
On 6/22/2012 8:40 AM, m.roth@5-cent.us wrote:
wvHtml works, but I don't like the output - it insists on <div>, and on &rhquo instead of plain, simple ".
You mean ”?
What's wrong with that? You wanted HTML, and *any* browser will understand that HTML entity, even Lynx.
If you wanted "HTML I can read like an e-book", I'd say you should be converting to Markdown instead. One path from Word to Markdown would be unrtf (https://www.gnu.org/software/unrtf/) to HTML, then HTML to Markdown via Pandoc (http://johnmacfarlane.net/pandoc/).
Warren Young wrote:
On 6/22/2012 8:40 AM, m.roth@5-cent.us wrote:
wvHtml works, but I don't like the output - it insists on <div>, and on &rhquo instead of plain, simple ".
You mean ”?
Yup.
What's wrong with that? You wanted HTML, and *any* browser will understand that HTML entity, even Lynx.
Hate it. I think it's completely unnecessary. I've done web pages, including professional and corporate ones, and never needed it. I use special characters only when there's no other option.
If you wanted "HTML I can read like an e-book", I'd say you should be converting to Markdown instead. One path from Word to Markdown would be unrtf (https://www.gnu.org/software/unrtf/) to HTML, then HTML to Markdown via Pandoc (http://johnmacfarlane.net/pandoc/).
How 'bout html I can read like wordperfect <alt-f3>?
mark
On Fri, 22 Jun 2012 16:40:49 -0400 m.roth@5-cent.us wrote:
Hate it. I think it's completely unnecessary. I've done web pages, including professional and corporate ones, and never needed it. I use special characters only when there's no other option.
Just use sed to change it to whatever you want it to be.
On 6/22/2012 2:40 PM, m.roth@5-cent.us wrote:
Warren Young wrote:
On 6/22/2012 8:40 AM, m.roth@5-cent.us wrote:
wvHtml works, but I don't like the output - it insists on <div>, and on &rhquo instead of plain, simple ".
You mean ”?
Yup.
What's wrong with that? You wanted HTML, and *any* browser will understand that HTML entity, even Lynx.
Hate it. I think it's completely unnecessary.
Five centuries of typographers would like to have a word with you.
” and " aren't the same thing. If the document includes curly quotes, the only correct alternative available to the HTML converter is to put out Unicode character U+201D.
Now, if your converter were converting straight quotation marks to ", you might have a point.
I've done web pages, including professional and corporate ones, and never needed it.
IMO, web pages with straight quotation marks are unprofessional. :)
Let the ASCII go, Mark. Just let it go. Unicode became usable over a decade ago, and became solid in most programs years ago.
On 6/22/2012 3:56 PM, Warren Young wrote:
Unicode became usable over a decade ago, and became solid in most programs years ago.
You know, thinking about it, I believe I've sold the Unicode on Linux stability story short. It's about a decade since it became solid, so "usable" must be considerably farther back; 2000, maybe? That makes sense, since Plan9 switched to UTF-8 in 1999.
I use Perl as my benchmark for Unicode stability. RHEL 2.1 (March 2002) shipped Perl 5.6 (March 2000), which was usable but dodgy in some ways w.r.t. Unicode. RHEL 3 (October 2003) shipped Perl 5.8 (July 2002), which fixed almost everything with Unicode handling. Each Perl since then has had Unicode changes, but they've just been small bug fixes and updates to track new Unicode specs. The core mechanisms haven't changed since 5.8.