How do I get utf-8 support with PCRE?
I am having problems building lucene index using Zend_Lucene. I get the following error
PHP Notice: iconv(): Detected an illegal character in input string in /var/www/ZendFramework-1.5.2/library/Zend/Search/Lucene/Analysis/Analyzer/Common/Text.php on line 56
Thanks in advance. Regards, Amitava Shee
Amitava Shee wrote:
How do I get utf-8 support with PCRE?
I am having problems building lucene index using Zend_Lucene. I get the following error
PHP Notice: iconv(): Detected an illegal character in input string in /var/www/ZendFramework-1.5.2/library/Zend/Search/Lucene/Analysis/Analyzer/Common/Text.php on line 56
a) What does that have to do with pcre? (which can do UTF-8)
b) What is on line 56 in that file? Looks like iconv is choking on that.
So try to process that file with iconv on the command line.
Ralph
Please see my reply inline below
On Fri, Jul 4, 2008 at 5:29 AM, Ralph Angenendt <ra+centos@br-online.dera%2Bcentos@br-online.de> wrote:
Amitava Shee wrote:
How do I get utf-8 support with PCRE?
I am having problems building lucene index using Zend_Lucene. I get the following error
PHP Notice: iconv(): Detected an illegal character in input string in
/var/www/ZendFramework-1.5.2/library/Zend/Search/Lucene/Analysis/Analyzer/Common/Text.php
on line 56
a) What does that have to do with pcre? (which can do UTF-8)
[Shee] Zend lucene search engine uses pcre and requires pcre to be compiled with --enable-utf8. Please see http://framework.zend.com/manual/en/zend.search.lucene.charset.html#zend.sea...
UTF-8 support can either be compiled into PCRE at build time or supported via shared library. But shared library support is included/excluded based on the distro. I believe, upstream RedHat does not include it. I was hoping to find a way in CentOS. I have no idea if other distro's support it. That's a research item for me.
b) What is on line 56 in that file? Looks like iconv is choking on that.
[Shee] Framework code - don't know much there
So try to process that file with iconv on the command line.
Ralph
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Amitava Shee wrote:
On Fri, Jul 4, 2008 at 5:29 AM, Ralph Angenendt <ra+centos@br-online.dera%2Bcentos@br-online.de> wrote:
Amitava Shee wrote:
How do I get utf-8 support with PCRE?
a) What does that have to do with pcre? (which can do UTF-8)
[Shee] Zend lucene search engine uses pcre and requires pcre to be compiled with --enable-utf8. Please see http://framework.zend.com/manual/en/zend.search.lucene.charset.html#zend.sea...
UTF-8 support can either be compiled into PCRE at build time or supported via shared library. But shared library support is included/excluded based on the distro. I believe, upstream RedHat does not include it. I was hoping to find a way in CentOS. I have no idea if other distro's support it. That's a research item for me.
As I said: pcre can do UTF-8:
%build %configure --enable-utf8
That's from the spec file. And again: It's not pcre, it is iconv which doesn't like a character in one of the framework's files.
Ralph
Yes, building from source will work. I just want to know if there is a package (in some yum repository) somewhere so that updates, patches etc. gets applied with "yum update". It would be nice to do something like
yum install pcre-utf8
-Amitava
On Mon, Jul 7, 2008 at 8:54 AM, Ralph Angenendt <ra+centos@br-online.dera%2Bcentos@br-online.de> wrote:
Amitava Shee wrote:
On Fri, Jul 4, 2008 at 5:29 AM, Ralph Angenendt <ra+centos@br-online.de ra%2Bcentos@br-online.de<
ra%2Bcentos@br-online.de ra%252Bcentos@br-online.de>>
wrote:
Amitava Shee wrote:
How do I get utf-8 support with PCRE?
a) What does that have to do with pcre? (which can do UTF-8)
[Shee] Zend lucene search engine uses pcre and requires pcre to be
compiled
with --enable-utf8. Please see
http://framework.zend.com/manual/en/zend.search.lucene.charset.html#zend.sea...
UTF-8 support can either be compiled into PCRE at build time or supported via shared library. But shared library support is included/excluded based
on
the distro. I believe, upstream RedHat does not include it. I was hoping
to
find a way in CentOS. I have no idea if other distro's support it. That's
a
research item for me.
As I said: pcre can do UTF-8:
%build %configure --enable-utf8
That's from the spec file. And again: It's not pcre, it is iconv which doesn't like a character in one of the framework's files.
Ralph
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Mon, Jul 7, 2008 at 10:36 AM, Amitava Shee amitava.shee@gmail.com wrote:
Yes, building from source will work. I just want to know if there is a package (in some yum repository) somewhere so that updates, patches etc. gets applied with "yum update". It would be nice to do something like
yum install pcre-utf8
Okay, there's a disconnect, somewhere which you aren't getting.
The pcre package included in centos does UTF8 just fine. The problem you are seeing is related to another package. You need to look at the script to see what iconv (where the problem actually is) is having problems with.
My error log with iconv is misleading. Please ignore that portion and instead use this little php script to check for utf-8 support in pcre
<?php
if (@preg_match('/\pL/u', 'a') == 1) { echo "PCRE unicode support is turned on.\n"; } else { echo "PCRE unicode support is turned off.\n"; }
?>
Also, please check out this thread (lack of pcre utf8 support in RHEL).
http://marc.info/?l=php-i18n&m=118303425505336&w=2
-Amitava
On Mon, Jul 7, 2008 at 10:45 AM, Jim Perrin jperrin@gmail.com wrote:
On Mon, Jul 7, 2008 at 10:36 AM, Amitava Shee amitava.shee@gmail.com wrote:
Yes, building from source will work. I just want to know if there is a package (in some yum repository) somewhere so that updates, patches etc. gets applied with "yum update". It would be nice to do something like
yum install pcre-utf8
Okay, there's a disconnect, somewhere which you aren't getting.
The pcre package included in centos does UTF8 just fine. The problem you are seeing is related to another package. You need to look at the script to see what iconv (where the problem actually is) is having problems with.
-- During times of universal deceit, telling the truth becomes a revolutionary act. George Orwell _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
sure thing <g>
MHR wrote:
Please stop top posting.
Thank you.
mhr _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Amitava Shee wrote:
Yes, building from source will work. I just want to know if there is a package (in some yum repository) somewhere so that updates, patches etc. gets applied with "yum update". It would be nice to do something like
yum install pcre-utf8
Again - and I'm going to type this very slowly: The supplied pcre which is *IN* CentOS *IS* built with UTF-8 support.
And: Your problem has *nothing* to do with pcre, your problem lies *within* the iconv library.
Ralph
On Tue, Jul 8, 2008 at 6:44 AM, Ralph Angenendt ra+centos@br-online.de wrote:
Okay kids, for those following along I'd like to take a moment to sum this thread up so far
No it isn't Yes it is No it isn't Yes it is No it isn't Yes it is.
Thank you. This has been a brief email summary. You may not return to your regularly scheduled insanity.
On Tue, Jul 08, 2008 at 08:23:59AM -0400, Jim Perrin enlightened us:
Okay kids, for those following along I'd like to take a moment to sum this thread up so far
No it isn't Yes it is No it isn't Yes it is No it isn't Yes it is.
Thank you. This has been a brief email summary. You may not return to your regularly scheduled insanity.
What should I do instead, if I can't return to insanity?
Matt
Matt Hyclak wrote on Tue, 8 Jul 2008 08:59:51 -0400:
What should I do instead, if I can't return to insanity?
go forward to it!
Kai
I tried to resist, but ...
On Tue, 2008-07-08 at 18:31 +0200, Kai Schaetzl wrote:
Matt Hyclak wrote on Tue, 8 Jul 2008 08:59:51 -0400:
What should I do instead, if I can't return to insanity?
What convinces you that you ever left it? Insane folks don't know they're insane.
go forward to it!
Kai
On Tue, Jul 8, 2008 at 9:44 AM, William L. Maltby CentOS4Bill@triad.rr.com wrote:
On Tue, 2008-07-08 at 18:31 +0200, Kai Schaetzl wrote:
Matt Hyclak wrote on Tue, 8 Jul 2008 08:59:51 -0400:
What should I do instead, if I can't return to insanity?
What convinces you that you ever left it? Insane folks don't know they're insane.
Oh, yes, we do - that's the difference between us and sane folks. Sane folks don't know that they are sane....
CNR.
mhr
On Tue, Jul 8, 2008 at 5:23 AM, Jim Perrin jperrin@gmail.com wrote:
Okay kids, for those following along I'd like to take a moment to sum this thread up so far
No it isn't Yes it is No it isn't Yes it is No it isn't Yes it is.
Thank you. This has been a brief email summary. You may not return to your regularly scheduled insanity.
We may NOT??? I happen to LIKE my regularly scheduled insanity - I need reality breaks from time to time. Gee, Jim, you really are a big meanie....
;^)
mhr
The issue is in CentOS 5. I ran the application successfully in Ubuntu 8.04.
PCRE in CentOS does not have "unicode properties" enabled. Please see pcretest -C outputs from CentOS and Ubuntu
CentOS 5 ======= [ashee@foobar]$ pcretest -C PCRE version 6.6 06-Feb-2006 Compiled with UTF-8 support No Unicode properties support Newline character is LF Internal link size = 2 POSIX malloc threshold = 10 Default match limit = 10000000 Default recursion depth limit = 10000000 Match recursion uses stack
Ubuntu ===== ashee@ubuntu:~$ pcretest -C PCRE version 7.4 2007-09-21 Compiled with UTF-8 support Unicode properties support Newline sequence is LF \R matches all Unicode newlines Internal link size = 2 POSIX malloc threshold = 10 Default match limit = 10000000 Default recursion depth limit = 10000000 Match recursion uses stack
Is there a way to enable these options (without the usual ./configure make)?
-Amitava
On Tue, Jul 8, 2008 at 6:44 AM, Ralph Angenendt <ra+centos@br-online.dera%2Bcentos@br-online.de> wrote:
Amitava Shee wrote:
Yes, building from source will work. I just want to know if there is a package (in some yum repository) somewhere so that updates, patches etc. gets applied with "yum update". It would be nice to do something like
yum install pcre-utf8
Again - and I'm going to type this very slowly: The supplied pcre which is *IN* CentOS *IS* built with UTF-8 support.
And: Your problem has *nothing* to do with pcre, your problem lies *within* the iconv library.
Ralph
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Amitava Shee wrote on Wed, 9 Jul 2008 13:27:35 -0400:
PCRE in CentOS does not have "unicode properties" enabled.
But that's different from what you claimed earlier!
Kai
Amitava Shee wrote:
The issue is in CentOS 5. I ran the application successfully in Ubuntu 8.04.
PCRE in CentOS does not have "unicode properties" enabled.
So it's not utf-8 support which is missing.
Is there a way to enable these options (without the usual ./configure make)?
Rebuild the src.rpm with the correct features enabled and/or file a bug upstream at http://bugzilla.redhat.com/.
Ralph