[CentOS] Robust Search Solution (with CentOS 4.3)

Mike Stankovic mlists2006 at yahoo.com
Wed Apr 12 12:47:19 UTC 2006


--- Paul <subsolar at subsolar.com> wrote:

> On Tue, 2006-04-11 at 06:55 -0700, Mike Stankovic
> wrote:
> > I've got about 10,000 docs I'd like to devise a
> > search/index for. I found a perl script called
> > Perlfect that can do that on an old P3 but at the
> > astronomical time of 7 hours. Another
> script(cgi/perl)
> > at hotscripts  can do the same but allows the "rm
> -rf
> > /" exploit. DoH!?
> > 
> > Is there anything perl/flatfile that can
> search/index
> > faster? This is a  nice  job for an aging P3 in
> the
> > corner so php/MySQL is not an option. Don't
> suggest
> > beagle/windows solutions as this is a CentOS 4.3
> system.
> 
> Well at work we have an archive of ~ 12K PDFs that
> engineering uses for
> process documentations and I use Swish-e
> (http://swish-e.org/) to index
> it so that they can search it.  The server it sits
> on is a PIII 733 with
> 512MB RAM and it takes about 90 minutes to re-index
> them every night.
> 
> It works well for us as it allows AND & OR
> operators, searches for
> phrases and other fairly advanced features.
> 
> The main limitation is that you need a filter to
> convert whatever the
> document is to one of the following: text, html or
> xml so it can be
> indexed.
> 
> Regards,
> Paul Berger
> 
> > __________________________________________________
> > Improve the mailing list by performing a simple
> search 
> > before posting and reading the faq/etiquette. 
> > Thank you!!
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> > http://mail.yahoo.com 
> > _______________________________________________
> > CentOS mailing list
> > CentOS at centos.org
> > http://lists.centos.org/mailman/listinfo/centos
> > 
> 
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
> 

Yes Swish-e is in dag's repo and appears to be
supported upstream very well. I was right about
htsearch it is one of the components of htdig (also
available in rpm format).

Does it have issues with charsets that are not Latin-1
(ISO-8859-1) or plain 7bit ASCII ?

__________________________________________________
Improve the mailing list by performing a simple search 
before posting and reading the faq/etiquette. 
Thank you!!

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



More information about the CentOS mailing list