Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0DC6C200B8C for ; Mon, 12 Sep 2016 18:59:24 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0CAA7160AB2; Mon, 12 Sep 2016 16:59:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 53C51160AC8 for ; Mon, 12 Sep 2016 18:59:23 +0200 (CEST) Received: (qmail 93967 invoked by uid 500); 12 Sep 2016 16:59:22 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 93877 invoked by uid 99); 12 Sep 2016 16:59:22 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Sep 2016 16:59:22 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id E0B342C014C for ; Mon, 12 Sep 2016 16:59:21 +0000 (UTC) Date: Mon, 12 Sep 2016 16:59:21 +0000 (UTC) From: "Mike Drob (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (LUCENE-7440) Document skipping on large indexes is broken MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 12 Sep 2016 16:59:24 -0000 [ https://issues.apache.org/jira/browse/LUCENE-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484634#comment-15484634 ] Mike Drob commented on LUCENE-7440: ----------------------------------- bq. Regarding the 1.8B docs number... at least in my tests I saw the top-level skip distance of ~268M w/ the default codec. Subtracting this from MAX_INT gives around 1.8B, which is around the number I saw prior to the overflow. To hit the bug, one also needs to be doing large skips toward the end of the index as well, in order to use the top level(s) of the multi-level skip list. Having a conjunction query of a highly unique term (or clause) in conjunction with a common term has a good chance of triggering (example: +timestamp:39520928456494 +doctype:common) Would this be faster to test if we configure a larger top-level skip distance? i.e. set up a skip distance of ~1B and then we'd only need to get to ~1.1B docs indexed (40% fewer docs, theoretically 40% faster?) or even set up a skip distance of ~2B to only need to index very few documents? Maybe this idea should be split into a separate issue to focus on improving the test? > Document skipping on large indexes is broken > -------------------------------------------- > > Key: LUCENE-7440 > URL: https://issues.apache.org/jira/browse/LUCENE-7440 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: 2.2 > Reporter: Yonik Seeley > Assignee: Yonik Seeley > Priority: Critical > Fix For: master (7.0), 6.3, 6.2.1 > > Attachments: LUCENE-7440.patch, LUCENE-7440.patch > > > Large skips on large indexes fail. > Anything that uses skips (such as a boolean query, filtered queries, faceted queries, join queries, etc) can trigger this bug on a sufficiently large index. > The bug is a numeric overflow in MultiLevelSkipList that has been present since inception (Lucene 2.2). It may not manifest until one has a single segment with more than ~1.8B documents, and a large skip is performed on that segment. > Typical stack trace on Lucene7-dev: > {code} > java.lang.ArrayIndexOutOfBoundsException: 110 > at org.apache.lucene.codecs.MultiLevelSkipListReader$SkipBuffer.readByte(MultiLevelSkipListReader.java:297) > at org.apache.lucene.store.DataInput.readVInt(DataInput.java:125) > at org.apache.lucene.codecs.lucene50.Lucene50SkipReader.readSkipData(Lucene50SkipReader.java:180) > at org.apache.lucene.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:163) > at org.apache.lucene.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:133) > at org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.advance(Lucene50PostingsReader.java:421) > at YCS_skip7$1.testSkip(YCS_skip7.java:307) > {code} > Typical stack trace on Lucene4.10.3: > {code} > 6-08-31 18:57:17,460 ERROR org.apache.solr.servlet.SolrDispatchFilter: null:java.lang.ArrayIndexOutOfBoundsException: 75 > at org.apache.lucene.codecs.MultiLevelSkipListReader$SkipBuffer.readByte(MultiLevelSkipListReader.java:301) > at org.apache.lucene.store.DataInput.readVInt(DataInput.java:122) > at org.apache.lucene.codecs.lucene41.Lucene41SkipReader.readSkipData(Lucene41SkipReader.java:194) > at org.apache.lucene.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:168) > at org.apache.lucene.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:138) > at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.advance(Lucene41PostingsReader.java:506) > at org.apache.lucene.search.TermScorer.advance(TermScorer.java:85) > [...] > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621) > [...] > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2004) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org