[CentOS] Robust Search Solution (with CentOS 4.3)

Wed Apr 12 02:47:36 UTC 2006
Paul <subsolar at subsolar.com>

On Tue, 2006-04-11 at 06:55 -0700, Mike Stankovic wrote:
> I've got about 10,000 docs I'd like to devise a
> search/index for. I found a perl script called
> Perlfect that can do that on an old P3 but at the
> astronomical time of 7 hours. Another script(cgi/perl)
> at hotscripts  can do the same but allows the "rm -rf
> /" exploit. DoH!?
> Is there anything perl/flatfile that can search/index
> faster? This is a  nice  job for an aging P3 in the
> corner so php/MySQL is not an option. Don't suggest
> beagle/windows solutions as this is a CentOS 4.3 system.

Well at work we have an archive of ~ 12K PDFs that engineering uses for
process documentations and I use Swish-e (http://swish-e.org/) to index
it so that they can search it.  The server it sits on is a PIII 733 with
512MB RAM and it takes about 90 minutes to re-index them every night.

It works well for us as it allows AND & OR operators, searches for
phrases and other fairly advanced features.

The main limitation is that you need a filter to convert whatever the
document is to one of the following: text, html or xml so it can be

Paul Berger

> __________________________________________________
> Improve the mailing list by performing a simple search 
> before posting and reading the faq/etiquette. 
> Thank you!!
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos