[CentOS-devel] catdoc: contributed RPM submission

Wed Jul 26 18:17:18 UTC 2006
Jon Peatfield <J.S.Peatfield at damtp.cam.ac.uk>

On Tue, 25 Jul 2006, Durval Menezes wrote:

> Here's my submission for a new package: catdoc, a nice
> Word/Excel/Powerpoint -> plaintext conversion utility.
> The URL to the SRPM is:
> 	http://www.durval.com.br/RPMS/el4/catdoc/catdoc-0.94.2-2dm.el4.src.rpm
> I've attached the .spec file.

When we built catdoc for SL3x I found that there was a problem parsing 
*some* XLS files -- it incorrectly guessed that some things were using 
2-byte lengths then treated the next byte as a charset descriptor and ends 
up thinking it is using multi-byte chars, and wanders off the end of the 
data.  The code to 'guess' the header data format is really quite ugly.

[ incidentally at least some versions of gnumeric seem to have a very 
similar issue, maybe that code is from a common source... ]

I applied a small patch which seems to work for us, though I got no reply 
from the author when I offered it.

We also added a patch make xls2csv quote even *null* values which solved a 
problem for us (though that is probably not what everyone wants...)

Our rpm also installs wordview by a different name to avoid a clash with 
another app of the same name (:-), but the srpm might (just) be worth 
glancing at:


If you want I can possibly arrange to provide a .XLS file which upsets 
catdoc (all my existing examples contain personal-data so I'd have to get 
one made specially -- the ones which cause problems have been e-mailed 
to us from another site).

Jon Peatfield,  Computer Officer,  DAMTP,  University of Cambridge
Mail:  jp107 at damtp.cam.ac.uk     Web:  http://www.damtp.cam.ac.uk/