[CentOS] Robust Search Solution (with CentOS 4.3)
Mike Stankovic
mlists2006 at yahoo.com
Wed Apr 12 12:47:19 UTC 2006
--- Paul <subsolar at subsolar.com> wrote:
> On Tue, 2006-04-11 at 06:55 -0700, Mike Stankovic
> wrote:
> > I've got about 10,000 docs I'd like to devise a
> > search/index for. I found a perl script called
> > Perlfect that can do that on an old P3 but at the
> > astronomical time of 7 hours. Another
> script(cgi/perl)
> > at hotscripts can do the same but allows the "rm
> -rf
> > /" exploit. DoH!?
> >
> > Is there anything perl/flatfile that can
> search/index
> > faster? This is a nice job for an aging P3 in
> the
> > corner so php/MySQL is not an option. Don't
> suggest
> > beagle/windows solutions as this is a CentOS 4.3
> system.
>
> Well at work we have an archive of ~ 12K PDFs that
> engineering uses for
> process documentations and I use Swish-e
> (http://swish-e.org/) to index
> it so that they can search it. The server it sits
> on is a PIII 733 with
> 512MB RAM and it takes about 90 minutes to re-index
> them every night.
>
> It works well for us as it allows AND & OR
> operators, searches for
> phrases and other fairly advanced features.
>
> The main limitation is that you need a filter to
> convert whatever the
> document is to one of the following: text, html or
> xml so it can be
> indexed.
>
> Regards,
> Paul Berger
>
> > __________________________________________________
> > Improve the mailing list by performing a simple
> search
> > before posting and reading the faq/etiquette.
> > Thank you!!
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam? Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > CentOS mailing list
> > CentOS at centos.org
> > http://lists.centos.org/mailman/listinfo/centos
> >
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
Yes Swish-e is in dag's repo and appears to be
supported upstream very well. I was right about
htsearch it is one of the components of htdig (also
available in rpm format).
Does it have issues with charsets that are not Latin-1
(ISO-8859-1) or plain 7bit ASCII ?
__________________________________________________
Improve the mailing list by performing a simple search
before posting and reading the faq/etiquette.
Thank you!!
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
More information about the CentOS
mailing list