[CentOS] OT: .doc,.xls,.pdf,.ppt (etc.) string parser/indexers
Rajagopal Swaminathan
raju.rajsand at gmail.com
Sun Aug 30 03:36:44 UTC 2009
Greetings,
On Fri, Aug 28, 2009 at 10:50 PM, Les Mikesell<lesmikesell at gmail.com> wrote:
> Does anyone have experience with linux tools to parse the text from
> common non-text file formats for searching? I'm trying to use the
> kinosearch add-on for twiki which is fine as far as the search goes, but
> it takes forever to generate the index.
I am not sure this answers your query to the point.
But I have seen Lucene .net SDK (With extensions to scour .doc, .odt,
.pdf etc.) to very good effect and pretty decent performance.
HTH
Thanks and Regards
Rajagopal
More information about the CentOS
mailing list