[CentOS-devel] catdoc: contributed RPM submission
Jon Peatfield
J.S.Peatfield at damtp.cam.ac.uk
Wed Jul 26 18:17:18 UTC 2006
On Tue, 25 Jul 2006, Durval Menezes wrote:
> Here's my submission for a new package: catdoc, a nice
> Word/Excel/Powerpoint -> plaintext conversion utility.
>
> The URL to the SRPM is:
> http://www.durval.com.br/RPMS/el4/catdoc/catdoc-0.94.2-2dm.el4.src.rpm
>
> I've attached the .spec file.
When we built catdoc for SL3x I found that there was a problem parsing
*some* XLS files -- it incorrectly guessed that some things were using
2-byte lengths then treated the next byte as a charset descriptor and ends
up thinking it is using multi-byte chars, and wanders off the end of the
data. The code to 'guess' the header data format is really quite ugly.
[ incidentally at least some versions of gnumeric seem to have a very
similar issue, maybe that code is from a common source... ]
I applied a small patch which seems to work for us, though I got no reply
from the author when I offered it.
We also added a patch make xls2csv quote even *null* values which solved a
problem for us (though that is probably not what everyone wants...)
Our rpm also installs wordview by a different name to avoid a clash with
another app of the same name (:-), but the srpm might (just) be worth
glancing at:
http://www.damtp.cam.ac.uk/user/jp107/sl3x-updates/SRPMS/catdoc-0.94.2-2.JSP.src.rpm
If you want I can possibly arrange to provide a .XLS file which upsets
catdoc (all my existing examples contain personal-data so I'd have to get
one made specially -- the ones which cause problems have been e-mailed
to us from another site).
--
Jon Peatfield, Computer Officer, DAMTP, University of Cambridge
Mail: jp107 at damtp.cam.ac.uk Web: http://www.damtp.cam.ac.uk/
More information about the CentOS-devel
mailing list