lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <>
Subject [jira] Commented: (LUCENE-2622) Random Test Failure org.apache.lucene.TestExternalCodecs.testPerFieldCodec (from TestExternalCodecs)
Date Wed, 15 Sep 2010 10:21:33 GMT


Simon Willnauer commented on LUCENE-2622:

It seems that we figured out whats going on here. The problem seem to be the optimization
done in LUCENE-2588 where we strip off the non-distinguishing suffix to save RAM in the loaded
terms index. The problem with this optimization is that it is not safe for all comparators.
The testcase runs with a reverse unicode comparator which triggers terms to appear in reverse
order during indexing. 
Yet, this is not a problem until we have run into the situations where the the stripped suffix
is required due to the nature of the comparator. In this case here we index number  from 0
- 173 and with the randomly set termIndexInterval set to 54 we run into a situation where
the indexing code was wrong about the prefix. It sees the term "49" with prior term "5" and
thinks it could strip of the "9" from the previous term and uses "4" as the indexed term.

Once we seek on the terms dictionary the binary search in CoreFieldIndex#getIndexOffset we
try to find the indexedTerm prior to term "44" we compare to "4" which returns -1 while comparing
to "49" would have yield 1. That lets us end up with the wrong offset and the assert blows

We somehow need to have access to the actually used comparator during building the indexed
terms to fix that - I will reopen LUCENE-2588

> Random Test Failure org.apache.lucene.TestExternalCodecs.testPerFieldCodec (from TestExternalCodecs)
> ----------------------------------------------------------------------------------------------------
>                 Key: LUCENE-2622
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Tests
>            Reporter: Mark Miller
>            Priority: Minor
> Error Message
> state.ord=54 startOrd=0 ir.isIndexTerm=true state.docFreq=1
> Stacktrace
> junit.framework.AssertionFailedError: state.ord=54 startOrd=0 ir.isIndexTerm=true state.docFreq=1
> 	at org.apache.lucene.index.codecs.standard.StandardTermsDictReader$FieldReader$
> 	at org.apache.lucene.index.DocumentsWriter.applyDeletes(
> 	at org.apache.lucene.index.DocumentsWriter.applyDeletes(
> 	at org.apache.lucene.index.IndexWriter.applyDeletes(
> 	at org.apache.lucene.index.IndexWriter.doFlushInternal(
> 	at org.apache.lucene.index.IndexWriter.doFlush(
> 	at org.apache.lucene.index.IndexWriter.flush(
> 	at org.apache.lucene.index.IndexWriter.optimize(
> 	at org.apache.lucene.index.IndexWriter.optimize(
> 	at org.apache.lucene.index.IndexWriter.optimize(
> 	at org.apache.lucene.TestExternalCodecs.testPerFieldCodec(
> 	at org.apache.lucene.util.LuceneTestCase.runBare(
> 	at
> Standard Output
> NOTE: random codec of testcase 'testPerFieldCodec' was: MockFixedIntBlock(blockSize=1327)
> NOTE: random locale of testcase 'testPerFieldCodec' was: lt_LT
> NOTE: random timezone of testcase 'testPerFieldCodec' was: Africa/Lusaka
> NOTE: random seed of testcase 'testPerFieldCodec' was: 812019387131615618

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message