lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Curtin" <m...@curtin.com>
Subject Re: TermFrequencies vector limits?
Date Mon, 21 Nov 2005 13:37:31 GMT
> When I go and retrieve the term frequency vectors, for
> any document under about 90k, everything looks as
> expected.  However for larger documents (I haven't
> found the exact point, but I know that those over 128k
> qualify) the sum of the term frequencies in the vector
> seems to max out at 10001.  Here's the code snippet
> that I'm using when I see this:

That's probably because there is a limit built into Lucene where it ignores any tokens in
a field past the first 10,000.  There is a property you can set to increase this limit.  I
dont' have the source in front of me right now, but if you go into the index subdirectory
of the Lucene source and grep for 10000, you should find it.  Let's say for purpose of argument
that the name of the property is "maxTokens".  Then you could just do this:

java -Dorg.apache.lucene.maxTokens=100000" yourapp ...

To get a higher limit.  Of course, you could also change the Lucene source file and recompile
it.  Note that you CANNOT just set the property in your code, in general, as the Lucene class
puts it into a static final int, meaning it examines the value of the property (once) at class
load time.

Good luck!

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message