lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Issues with SortedSetDocValuesAccumulator when index has multiple segments?
Date Wed, 03 Jul 2013 20:48:02 GMT
I opened

Kaze, if you could try out that patch and see if it throws a better
exception in your case that would be great ...

Mike McCandless

On Wed, Jul 3, 2013 at 4:16 PM, Michael McCandless
<> wrote:
> Hmm, not good.
> One trickiness with SSDVA is that you must create a new
> SortedSetDocValuesReaderState every time you open a new IndexReader.
> If you don't do this correctly, e.g. you use the SSDVReaderState from
> an old reader, then it can lead to exceptions like this.
> Is it possible that's happening in your case?
> We should add a check for this in the code so you get a better
> exception ... I'll open an issue.
> Mike McCandless
> On Wed, Jul 3, 2013 at 2:52 PM, Kaze <> wrote:
>> Hello,
>> I'm a novice Lucene user and just started using it to do some prototyping
>> for my project.
>> I noticed SortedSetDocValues was introduced in 4.3.0 that allows faceted
>> search without a dedicated taxonomy index.  I've successfully used it to
>> perform faceting on a small index (~3000 documents, ~400 bytes per doc).
>> But when I loaded a bigger index (~50000 documents), I started getting
>> ArrayIndexOutOfBounds exception when SortedSetDocValuesAccumulator performs
>> aggregation.
>> Specifically, it errors out on line 139 where it tries to migrate segment
>> ordinals to global ordinals.  I've poked around and did some debugging; the
>> following is my finding.
>> The smaller index only had one segment when initially loaded, while the
>> bigger one had multiple.  My test suite consists of some searches on the
>> index with occasional updates to the index.  The error only happens when I
>> do a faceted search immediately following an update to the index.
>> Then I tried forcing a merge of the segments for the larger index as the
>> final step of initial indexing.  So when I initially loaded the index
>> afterwards, there was only one segment.  This time there were no errors,
>> even though it was the same set of documents.  Interestingly, even though
>> segments are created as I do updates on the index as part of my test suite,
>> no errors crop up afterwards.  I can add that I've only seen issues with 3
>> or more segments, while 2 seems to work.  I don't know why this would be
>> the case but these are my observations.
>> Let me know if there is some standard way to report bugs that I should
>> follow.  I've checked out the JIRA page for Lucene, but it looked more like
>> a "find bugs, create issue, fix it, upload patch", where the issue creator
>> fixes the bug.  I have a long ways to go before I understand the low level
>> implementation to apply a fix :(
>> Thanks

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message