[CentOS] UTF-8 support in PCRE

Mon Jul 7 12:11:50 UTC 2008
Amitava Shee <amitava.shee at gmail.com>

Please see my reply inline below

On Fri, Jul 4, 2008 at 5:29 AM, Ralph Angenendt
<ra+centos at br-online.de<ra%2Bcentos at br-online.de>>
wrote:

> Amitava Shee wrote:
> > How do I get utf-8 support with PCRE?
> >
> > I am having problems building lucene index using Zend_Lucene. I get the
> > following error
> >
> >
> > PHP Notice:  iconv(): Detected an illegal character in input string in
> >
> /var/www/ZendFramework-1.5.2/library/Zend/Search/Lucene/Analysis/Analyzer/Common/Text.php
> > on line 56
>
> a) What does that have to do with pcre? (which can do UTF-8)


[Shee] Zend lucene search engine uses pcre and requires pcre to be compiled
with --enable-utf8. Please see
http://framework.zend.com/manual/en/zend.search.lucene.charset.html#zend.search.lucene.charset.utf_analyzer

UTF-8 support can either be compiled into PCRE at build time or supported
via shared library. But shared library support is included/excluded based on
the distro. I believe, upstream RedHat does not include it. I was hoping to
find a way in CentOS. I have no idea if other distro's support it. That's a
research item for me.

>
>
> b) What is on line 56 in that file? Looks like iconv is choking on that.

[Shee] Framework code - don't know much there

>
>
> So try to process that file with iconv on the command line.
>
> Ralph
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/centos/attachments/20080707/934a0d3e/attachment-0003.html>