lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <marigol...@yahoo.com>
Subject Re: TermFrequencies vector limits?
Date Mon, 21 Nov 2005 13:28:02 GMT
Just to make sure that I understand this correctly,
the docs say: 

" By default, no more than 10,000 terms will be
indexed for a field."

Given your note, then the docs do not mean that no
more than 10,000 terms will be indexed, but that some
smaller number of terms will be indexed and only the
first 10,000 occurrances will be tallied.  

Is that correct?

Thanks
-MG

------ Original Message ------
Received: Mon, 21 Nov 2005 03:04:42 AM EST
From: Paul Elschot <paul.elschot@xs4all.nl>
To: java-user@lucene.apache.org
Subject: Re: TermFrequencies vector limits?

> On Monday 21 November 2005 02:16,
marigoldcc@yahoo.com wrote:
> > Hi.  I was wondering if anyone else has seen this
> > before.  I'm using  lucene 1.4.3 and have indexed
> > about 3000 text documents using the statement:
> > 
> > doc.add(Field.Text("contents", new FileReader(f),
> > true));
> > 
> > When I go and retrieve the term frequency vectors,
for
> > any document under about 90k, everything looks as
> > expected.  However for larger documents (I haven't
> > found the exact point, but I know that those over
128k
> > qualify) the sum of the term frequencies in the
vector
> > seems to max out at 10001.  
> ..
> 
> That's correct, have a look here for
IndexWriter.maxFieldLength :
>
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71
> 
> Regards,
> Paul Elschot
> 
> 


	
		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message