On Tue, 25 Jul 2006, Durval Menezes wrote: > Here's my submission for a new package: catdoc, a nice > Word/Excel/Powerpoint -> plaintext conversion utility. > > The URL to the SRPM is: > http://www.durval.com.br/RPMS/el4/catdoc/catdoc-0.94.2-2dm.el4.src.rpm > > I've attached the .spec file. When we built catdoc for SL3x I found that there was a problem parsing *some* XLS files -- it incorrectly guessed that some things were using 2-byte lengths then treated the next byte as a charset descriptor and ends up thinking it is using multi-byte chars, and wanders off the end of the data. The code to 'guess' the header data format is really quite ugly. [ incidentally at least some versions of gnumeric seem to have a very similar issue, maybe that code is from a common source... ] I applied a small patch which seems to work for us, though I got no reply from the author when I offered it. We also added a patch make xls2csv quote even *null* values which solved a problem for us (though that is probably not what everyone wants...) Our rpm also installs wordview by a different name to avoid a clash with another app of the same name (:-), but the srpm might (just) be worth glancing at: http://www.damtp.cam.ac.uk/user/jp107/sl3x-updates/SRPMS/catdoc-0.94.2-2.JSP.src.rpm If you want I can possibly arrange to provide a .XLS file which upsets catdoc (all my existing examples contain personal-data so I'd have to get one made specially -- the ones which cause problems have been e-mailed to us from another site). -- Jon Peatfield, Computer Officer, DAMTP, University of Cambridge Mail: jp107 at damtp.cam.ac.uk Web: http://www.damtp.cam.ac.uk/