lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: question about IndexWriter.maxFieldLength
Date Tue, 17 May 2005 17:25:28 GMT

On May 17, 2005, at 5:33 AM, Pablo Gomes Ludermir wrote:

> Dear all,
>
> I would like to know about the maxFieldLength. It says on the Javadocs
> that it limits "The maximum number of terms that will be indexed for a
> single field in a document." So, for instance, in my "contents" field,
> I would have it limited by default to 10.000 terms. But which terms
> are those? The first 10.000 to be indexed?
> Or is there any feature selection approach? Like, the most frequent
> 10.000 terms are indexed and the rest are discarded? Anyone knows
> that? If this is not the case, Is it possible to implement?

It's the first 10,000 terms.  You could implement an analyzer that  
buffered tokens and only emitted the most frequent ones as one  
possible way to pick which ones are indexed - there may be other ways  
to accomplish this by hacking Lucene itself.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message