lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl
Date Sun, 02 Dec 2012 15:35:58 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508292#comment-13508292
] 

Uwe Schindler commented on LUCENE-4584:
---------------------------------------

I agree with Robert here. We don't need to test random data, for Lucene only 2 things are
important:
- When you compress random data and decompress it again, the same exact bytes must come back.
This should be tested and needs no external C code. This is the doesn't corrumptâ„¢ Robert
is talking about.
- The compressed content should never get significantly bigger

There is no reason at all that Lucene's LZ4 returns the same compressed output. E.g. if we
find a better algorithm that performs better in Hotspot, although it compresses to a different
byte array, we are perfectly fine.

If we want to assert for now that both algorithms create the same compressed output, we should
have three different size random byte files (e.g. generated by /dev/urandom) as test resources
and the C-compressed ones also as test resources, and then we can compare the results. We
should just document how the test data was created. But keep in mind: We may change the algorithm
to produce different bytes, so this is not mandatory. I think we may only assert that the
compression percentage of the random data is identical, not the actual bytes.
                
> Compare the LZ4 implementation in Lucene against the original impl
> ------------------------------------------------------------------
>
>                 Key: LUCENE-4584
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4584
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>             Fix For: 4.1
>
>
> We should add tests to make sure that the LZ4 impl in Lucene compresses data the exact
same way as the original impl.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message