lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Updated: (SOLR-1091) "phps" (serialized PHP) writer produces invalid output
Date Thu, 27 Aug 2009 17:06:59 GMT

     [ https://issues.apache.org/jira/browse/SOLR-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yonik Seeley updated SOLR-1091:
-------------------------------

    Attachment: SOLR-1091.patch

Here's a patch that can handle the modified UTF8 that Jetty puts out, as well as speeding
up the normal UTF8 case using Lucene's UTF8 encoding.

modified UTF8 support is switched on if the jetty.home property is set (jetty does this by
default).

> "phps" (serialized PHP) writer produces invalid output
> ------------------------------------------------------
>
>                 Key: SOLR-1091
>                 URL: https://issues.apache.org/jira/browse/SOLR-1091
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.3
>         Environment: Sun JRE 1.6.0 on Centos 5
>            Reporter: frank farmer
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-1091.patch
>
>
> The serialized PHP output writer can outputs invalid string lengths for certain (unusual)
input values.  Specifically, I had a document containing the following 6 byte character sequence:
\xED\xAF\x80\xED\xB1\xB8
> I was able to create a document in the index containing this value without issue; however,
when fetching the document back out using the serialized PHP writer, it returns a string like
the following:
> s:4:"􀁸";
> Note that the string length specified is 4, while the string is actually 6 bytes long.
> When using PHP's native serialize() function, it correctly sets the length to 6:
> # php -r 'var_dump(serialize("\xED\xAF\x80\xED\xB1\xB8"));'
> string(13) "s:6:"􀁸";"
> The "wt=php" writer, which produces output to be parsed with eval(), doesn't have any
trouble with this string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message