lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: multibyte file.encoding.
Date Wed, 04 Jul 2012 22:02:26 GMT
definitely looks like mockpayloadfilter is buggy: suprised it hasn't bit us
already
On Jul 4, 2012 5:20 PM, "Dawid Weiss" <dawid.weiss@gmail.com> wrote:

> A few things came up, as always. Notably this one is a showstopper:
>
> https://github.com/carrotsearch/randomizedtesting/issues/116
>
> I'll fix it tomorrow and push this forward. In the mean time I already
> have bugs to fix for those who are still awake:
>
> (run with -Dfile.encoding=UTF-16 from Eclipse).
>
> 1)
>
> public class TestDocument extends LuceneTestCase {
> ...
>     IndexableField binaryFld = new StoredField("binary",
> binaryVal.getBytes());
>     IndexableField binaryFld2 = new StoredField("binary",
> binaryVal2.getBytes());
>
> strings converted to binary, then converted back as if they were UTF-8
> (no-no), results in:
>
> [22:56:54.156] FAILURE 0.05s J3 | TestDocument.testBinaryField
>    > Throwable #1: java.lang.AssertionError: b=254
>    >    at
> __randomizedtesting.SeedInfo.seed([B6175210365F8DBF:8BDDBF522ECEAEC1]:0)
>    >    at
> org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:591)
>    >    at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:165)
>    >    at
> org.apache.lucene.document.TestDocument.testBinaryField(TestDocument.java:66)
>
> 2)
>
> [22:56:55.326] FAILURE 0.12s J2 | TestPostingsOffsets.testPayloads
>    > Throwable #1: java.lang.AssertionError: b=254
>    >    at
> __randomizedtesting.SeedInfo.seed([B6175210365F8DBF:A45C72D3AD1F56B7]:0)
>    >    at
> org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:591)
>    >    at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:165)
>
> The bug/ issue is in MockPayloadFilter which stores default-encoding
> bytes as payload:
>
>       payloadAttr.setPayload(new BytesRef(("pos: " + pos).getBytes()));
>
> which then gets converted wrongly as UTF-8 in
> TestPostingsOffsets.doTestNumbers:
>
>             assertTrue(payload.utf8ToString().startsWith("pos:"));
>
> I only ran lucene's test-core... :)
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message