lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lamprecht <clampre...@gmail.com>
Subject Re: Indexing terms limit
Date Wed, 10 Aug 2005 19:58:57 GMT
See IndexWriter.setMaxFieldLength(), I think it's what you want:

from javadocs:

public void setMaxFieldLength(int maxFieldLength)

The maximum number of terms that will be indexed for a single field in
a document. This limits the amount of memory required for indexing, so
that collections with very large files will not crash the indexing
process by running out of memory.
Note that this effectively truncates large documents, excluding from
the index terms that occur further in the document. If you know your
source documents are large, be sure to set this value high enough to
accomodate the expected size. If you set it to Integer.MAX_VALUE, then
the only limit is your memory, but you should anticipate an
OutOfMemoryError.

By default, no more than 10,000 terms will be indexed for a field.

On 8/10/05, Tim Johnson <timothy.w.johnson@saic.com> wrote:
> I'm currently attempting to index the distinct list of terms found in a
> Lucene index using the TermEnum.  I'm creating a document with each list
> and indexing the document of terms.  It appears there's a limit of
> 10,000 distinct terms within a given document.  Can this be overcome??
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message