On Tue, 2006-04-11 at 06:55 -0700, Mike Stankovic wrote:
I've got about 10,000 docs I'd like to devise a search/index for. I found a perl script called Perlfect that can do that on an old P3 but at the astronomical time of 7 hours. Another script(cgi/perl) at hotscripts can do the same but allows the "rm -rf /" exploit. DoH!?
Is there anything perl/flatfile that can search/index faster? This is a nice job for an aging P3 in the corner so php/MySQL is not an option. Don't suggest beagle/windows solutions as this is a CentOS 4.3 system.
Well at work we have an archive of ~ 12K PDFs that engineering uses for process documentations and I use Swish-e (http://swish-e.org/) to index it so that they can search it. The server it sits on is a PIII 733 with 512MB RAM and it takes about 90 minutes to re-index them every night.
It works well for us as it allows AND & OR operators, searches for phrases and other fairly advanced features.
The main limitation is that you need a filter to convert whatever the document is to one of the following: text, html or xml so it can be indexed.
Regards, Paul Berger
Improve the mailing list by performing a simple search before posting and reading the faq/etiquette. Thank you!!
Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos