lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Increase number of available positions?
Date Mon, 15 Mar 2010 12:36:52 GMT
Is your entire corpus a single document? Because I'm having trouble
imagining a single document where this would be a problem, unless
your increment gap is huge. The term positions are relative to
a single document...

You say that your levels have less than 1,000 elements each With
an increment gap of 100, you're only talking a total here of 300,000
as your increment gap "holes", so you've got room for, uhhhhmm, a lot
more tokens per document. If you're  running over that limit, the
increment gap is the least of your problems <G>...

Of course I may be missing the point completely...


On Mon, Mar 15, 2010 at 5:03 AM, Rene Hackl-Sommer <>wrote:

> Hello,
> I am working at a use case that is very demanding regarding the number of
> token positions. For one special field in the index, I need to represent
> different hierarchy levels, like this:
> <MyField>
> <Level_1>
> <Level_2>
> <Level_3>
> Please note that I need to do this with Lucene, not a XML search engine.
> Now, on Level_3 there a hundreds of tokens, Level_2 also has hundreds of
> entries and Level_1 is in there with a low 3-digit figure. For those who
> wish to know: this is an intricate system of chemical entities and some
> their properties.
> I need this information to be searchable in all conceivable ways. What I am
> doing right now is use position increment gaps to separate the Levels and
> search with SpanQueries. It works like a charm for a setup with limited
> entries. But Integer.MAX_VALUE poses a cap on the approach, of course. Would
> it be thinkable to replace the current integer counting system with a long
> based system? What issues should I consider?
> Thanks,
> Rene
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message