lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: question about IndexWriter.maxFieldLength
Date Tue, 17 May 2005 18:53:31 GMT
On Tuesday 17 May 2005 11:33, Pablo Gomes Ludermir wrote:
> Dear all,
> 
> I would like to know about the maxFieldLength. It says on the Javadocs
> that it limits "The maximum number of terms that will be indexed for a
> single field in a document." So, for instance, in my "contents" field,
> I would have it limited by default to 10.000 terms. But which terms
> are those? The first 10.000 to be indexed?

For every field in the document, the first 10.000 tokens returned
from the analyzer are indexed.

> Or is there any feature selection approach? Like, the most frequent
> 10.000 terms are indexed and the rest are discarded? Anyone knows
> that? If this is not the case, Is it possible to implement?

There is no feature selection, except for stopword removal by
some of the analyzers.
Other feature selection mechanisms are up to you.
The reason for the 10.000 terms limitation is to have an upperbound
the memory used for indexing a single document.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message