lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Sentence Endings: IndexWriter.maxFieldLength and Token.setPositionIncrement()
Date Sat, 20 Dec 2003 00:53:36 GMT

Someone else recently made a similar, reasonable complaint.  I agree 
that this should be fixed.  The fastest way to get it fixed would be to 
submit a patch to lucene-dev, with a test case, etc.


Jochen Frey wrote:
> Hi!
> I hope this is the right forum for this post.
> I was wondering if other people would consider this a bug (it might be a
> feature and I am missing the point of it):
> .The default IndexWriter.maxFieldLength is 10,000.
> .The point of maxFieldLength is to limit memory usage.
> .The current position (which is compared against maxFieldLength) is
> essentially determined by the sum of the PositionIncrements of all Tokens
> added to the index.
> Why does this matter? If you have setPositionIncrement(1000) for sentence
> ending tokens, only the first 10 sentences of your document will be indexed,
> the rest will not be searchable (since position will be greater than
> 10,000).
> Why I think this is a bug: If you skip 1000 positions, no memory is required
> by the DocumentWriter for the empty 999 positions, thus not using
> maxFieldLength to limit memory but simply available positions.
> I suggest that there be a counter in DocumentWriter, that counts the actual
> number of tokens in the postingTable (probably in
> DocumentWriter.addPosition), so that maxFieldLength is compared against the
> number actual entries, not the number of actual entries and the number
> skipped entries.
> Best,
> 	Jochen
> PS: Please let me know if this is the wrong forum for this so I'll post to
> the right one next time.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message