--- Paul subsolar@subsolar.com wrote:
On Tue, 2006-04-11 at 06:55 -0700, Mike Stankovic wrote:
I've got about 10,000 docs I'd like to devise a search/index for. I found a perl script called Perlfect that can do that on an old P3 but at the astronomical time of 7 hours. Another
script(cgi/perl)
at hotscripts can do the same but allows the "rm
-rf
/" exploit. DoH!?
Is there anything perl/flatfile that can
search/index
faster? This is a nice job for an aging P3 in
the
corner so php/MySQL is not an option. Don't
suggest
beagle/windows solutions as this is a CentOS 4.3
system.
Well at work we have an archive of ~ 12K PDFs that engineering uses for process documentations and I use Swish-e (http://swish-e.org/) to index it so that they can search it. The server it sits on is a PIII 733 with 512MB RAM and it takes about 90 minutes to re-index them every night.
It works well for us as it allows AND & OR operators, searches for phrases and other fairly advanced features.
The main limitation is that you need a filter to convert whatever the document is to one of the following: text, html or xml so it can be indexed.
Regards, Paul Berger
Improve the mailing list by performing a simple
search
before posting and reading the faq/etiquette. Thank you!!
Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam
protection around
http://mail.yahoo.com _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Yes Swish-e is in dag's repo and appears to be supported upstream very well. I was right about htsearch it is one of the components of htdig (also available in rpm format).
Does it have issues with charsets that are not Latin-1 (ISO-8859-1) or plain 7bit ASCII ?
__________________________________________________ Improve the mailing list by performing a simple search before posting and reading the faq/etiquette. Thank you!!
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com