lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Issues with SortedSetDocValuesAccumulator when index has multiple segments?
Date Wed, 03 Jul 2013 20:16:30 GMT
Hmm, not good.

One trickiness with SSDVA is that you must create a new
SortedSetDocValuesReaderState every time you open a new IndexReader.

If you don't do this correctly, e.g. you use the SSDVReaderState from
an old reader, then it can lead to exceptions like this.

Is it possible that's happening in your case?

We should add a check for this in the code so you get a better
exception ... I'll open an issue.

Mike McCandless

On Wed, Jul 3, 2013 at 2:52 PM, Kaze <> wrote:
> Hello,
> I'm a novice Lucene user and just started using it to do some prototyping
> for my project.
> I noticed SortedSetDocValues was introduced in 4.3.0 that allows faceted
> search without a dedicated taxonomy index.  I've successfully used it to
> perform faceting on a small index (~3000 documents, ~400 bytes per doc).
> But when I loaded a bigger index (~50000 documents), I started getting
> ArrayIndexOutOfBounds exception when SortedSetDocValuesAccumulator performs
> aggregation.
> Specifically, it errors out on line 139 where it tries to migrate segment
> ordinals to global ordinals.  I've poked around and did some debugging; the
> following is my finding.
> The smaller index only had one segment when initially loaded, while the
> bigger one had multiple.  My test suite consists of some searches on the
> index with occasional updates to the index.  The error only happens when I
> do a faceted search immediately following an update to the index.
> Then I tried forcing a merge of the segments for the larger index as the
> final step of initial indexing.  So when I initially loaded the index
> afterwards, there was only one segment.  This time there were no errors,
> even though it was the same set of documents.  Interestingly, even though
> segments are created as I do updates on the index as part of my test suite,
> no errors crop up afterwards.  I can add that I've only seen issues with 3
> or more segments, while 2 seems to work.  I don't know why this would be
> the case but these are my observations.
> Let me know if there is some standard way to report bugs that I should
> follow.  I've checked out the JIRA page for Lucene, but it looked more like
> a "find bugs, create issue, fix it, upload patch", where the issue creator
> fixes the bug.  I have a long ways to go before I understand the low level
> implementation to apply a fix :(
> Thanks

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message