lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <>
Subject [jira] [Commented] (LUCENE-7440) Document skipping on large indexes is broken
Date Fri, 09 Sep 2016 14:13:20 GMT


Yonik Seeley commented on LUCENE-7440:

Regarding the 1.8B docs number... at least in my tests I saw the top-level skip distance of
~268M w/ the default codec.  Subtracting this from MAX_INT gives around 1.8B, which is around
the number I saw prior to the overflow.  To hit the bug, one also needs to be doing *large*
skips toward the end of the index as well, in order to use the top level(s) of the multi-level
skip list.  Having a conjunction query of a highly unique term (or clause) in conjunction
with a common term has a good chance of triggering (example:  +timestamp:39520928456494 +doctype:common)

> Document skipping on large indexes is broken
> --------------------------------------------
>                 Key: LUCENE-7440
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 2.2
>            Reporter: Yonik Seeley
>            Priority: Critical
>         Attachments: LUCENE-7440.patch
> Large skips on large indexes fail.
> Anything that uses skips (such as a boolean query, filtered queries, faceted queries,
join queries, etc) can trigger this bug on a sufficiently large index.
> The bug is a numeric overflow in MultiLevelSkipList that has been present since inception
(Lucene 2.2).  It may not manifest until one has a single segment with more than ~1.8B documents,
and a large skip is performed on that segment.
> Typical stack trace on Lucene7-dev:
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 110
> 	at org.apache.lucene.codecs.MultiLevelSkipListReader$SkipBuffer.readByte(
> 	at
> 	at org.apache.lucene.codecs.lucene50.Lucene50SkipReader.readSkipData(
> 	at org.apache.lucene.codecs.MultiLevelSkipListReader.loadNextSkip(
> 	at org.apache.lucene.codecs.MultiLevelSkipListReader.skipTo(
> 	at org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.advance(
> 	at YCS_skip7$1.testSkip(
> {code}
> Typical stack trace on Lucene4.10.3:
> {code}
> 6-08-31 18:57:17,460 ERROR org.apache.solr.servlet.SolrDispatchFilter: null:java.lang.ArrayIndexOutOfBoundsException:
>  at org.apache.lucene.codecs.MultiLevelSkipListReader$SkipBuffer.readByte(
>  at
>  at org.apache.lucene.codecs.lucene41.Lucene41SkipReader.readSkipData(
>  at org.apache.lucene.codecs.MultiLevelSkipListReader.loadNextSkip(
>  at org.apache.lucene.codecs.MultiLevelSkipListReader.skipTo(
>  at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.advance(
>  at
> [...]
>  at
> [...]
>  at org.apache.solr.core.SolrCore.execute(
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message