lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@gmail.com>
Subject multibyte file.encoding.
Date Wed, 04 Jul 2012 21:19:21 GMT
A few things came up, as always. Notably this one is a showstopper:

https://github.com/carrotsearch/randomizedtesting/issues/116

I'll fix it tomorrow and push this forward. In the mean time I already
have bugs to fix for those who are still awake:

(run with -Dfile.encoding=UTF-16 from Eclipse).

1)

public class TestDocument extends LuceneTestCase {
...
    IndexableField binaryFld = new StoredField("binary", binaryVal.getBytes());
    IndexableField binaryFld2 = new StoredField("binary",
binaryVal2.getBytes());

strings converted to binary, then converted back as if they were UTF-8
(no-no), results in:

[22:56:54.156] FAILURE 0.05s J3 | TestDocument.testBinaryField
   > Throwable #1: java.lang.AssertionError: b=254
   > 	at __randomizedtesting.SeedInfo.seed([B6175210365F8DBF:8BDDBF522ECEAEC1]:0)
   > 	at org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:591)
   > 	at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:165)
   > 	at org.apache.lucene.document.TestDocument.testBinaryField(TestDocument.java:66)

2)

[22:56:55.326] FAILURE 0.12s J2 | TestPostingsOffsets.testPayloads
   > Throwable #1: java.lang.AssertionError: b=254
   > 	at __randomizedtesting.SeedInfo.seed([B6175210365F8DBF:A45C72D3AD1F56B7]:0)
   > 	at org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:591)
   > 	at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:165)

The bug/ issue is in MockPayloadFilter which stores default-encoding
bytes as payload:

      payloadAttr.setPayload(new BytesRef(("pos: " + pos).getBytes()));

which then gets converted wrongly as UTF-8 in TestPostingsOffsets.doTestNumbers:

            assertTrue(payload.utf8ToString().startsWith("pos:"));

I only ran lucene's test-core... :)

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message