lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-6192) Long overflow in LuceneXXSkipWriter can corrupt skip data
Date Wed, 21 Jan 2015 16:33:35 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285842#comment-14285842
] 

ASF subversion and git services commented on LUCENE-6192:
---------------------------------------------------------

Commit 1653588 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1653588 ]

LUCENE-6192: don't overflow int when writing skip data for high freq terms in extremely large
indices

> Long overflow in LuceneXXSkipWriter can corrupt skip data
> ---------------------------------------------------------
>
>                 Key: LUCENE-6192
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6192
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, Trunk, 4.x
>
>         Attachments: LUCENE-6192.patch
>
>
> I've been iterating with Tom on this corruption that CheckIndex detects in his rather
large index (720 GB in a single segment):
> {noformat}
>  java -Xmx16G -Xms16G -cp $JAR -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
/XXXX/shards/4/core-1/data/test_index -verbose 2>&1 |tee -a shard4_reoptimizedNewJava
> Opening index @ /htsolr/lss-reindex/shards/4/core-1/data/test_index
> Segments file=segments_e numSegments=1 version=4.10.2 format= userData={commitTimeMSec=1421479358825}
>   1 of 1: name=_8m8 docCount=1130856
>     version=4.10.2
>     codec=Lucene410
>     compound=false
>     numFiles=10
>     size (MB)=719,967.32
>     diagnostics = {timestamp=1421437320935, os=Linux, os.version=2.6.18-400.1.1.el5,
mergeFactor=2, source=merge, lucene.version=4.10.2, os.arch=amd64, mergeMaxNumSegments=1,
java.version=1.7.0_71, java.vendor=Oracle Corporation}
>     no deletions
>     test: open reader.........OK
>     test: check integrity.....OK
>     test: check live docs.....OK
>     test: fields..............OK [80 fields]
>     test: field norms.........OK [23 fields]
>     test: terms, freq, prox...ERROR: java.lang.AssertionError: -96
> java.lang.AssertionError: -96
>         at org.apache.lucene.codecs.lucene41.ForUtil.skipBlock(ForUtil.java:228)
>         at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.skipPositions(Lucene41PostingsReader.java:925)
>         at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.nextPosition(Lucene41PostingsReader.java:955)
>         at org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1100)
>         at org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1357)
>         at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:655)
>         at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
>     test: stored fields.......OK [67472796 total field count; avg 59.665 fields per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields
per doc]
>     test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 SORTED;
0 SORTED_NUMERIC; 0 SORTED_SET]
> FAILED
>     WARNING: fixIndex() would remove reference to this segment; full exception:
> java.lang.RuntimeException: Term Index test failed
>         at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:670)
>         at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
> WARNING: 1 broken segments (containing 1130856 documents) detected
> WARNING: would write new segments file, and 1130856 documents would be lost, if -fix
were specified
> {noformat}
> And Rob spotted long -> int casts in our skip list writers that look like they could
cause such corruption if a single high-freq term with many positions required > 2.1 GB
to write its positions into .pos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message