lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: Sentence Endings: IndexWriter.maxFieldLength and Token.setPositionIncrement()
Date Sat, 20 Dec 2003 00:53:36 GMT
Jochen,

Someone else recently made a similar, reasonable complaint.  I agree 
that this should be fixed.  The fastest way to get it fixed would be to 
submit a patch to lucene-dev, with a test case, etc.

Doug

Jochen Frey wrote:
> Hi!
> 
> I hope this is the right forum for this post.
> 
> I was wondering if other people would consider this a bug (it might be a
> feature and I am missing the point of it):
>  
> .The default IndexWriter.maxFieldLength is 10,000.
> .The point of maxFieldLength is to limit memory usage.
> .The current position (which is compared against maxFieldLength) is
> essentially determined by the sum of the PositionIncrements of all Tokens
> added to the index.
> 
> Why does this matter? If you have setPositionIncrement(1000) for sentence
> ending tokens, only the first 10 sentences of your document will be indexed,
> the rest will not be searchable (since position will be greater than
> 10,000).
> 
> Why I think this is a bug: If you skip 1000 positions, no memory is required
> by the DocumentWriter for the empty 999 positions, thus not using
> maxFieldLength to limit memory but simply available positions.
> 
> I suggest that there be a counter in DocumentWriter, that counts the actual
> number of tokens in the postingTable (probably in
> DocumentWriter.addPosition), so that maxFieldLength is compared against the
> number actual entries, not the number of actual entries and the number
> skipped entries.
> 
> Best,
> 	Jochen
> 
> PS: Please let me know if this is the wrong forum for this so I'll post to
> the right one next time.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message