lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Johnson" <timothy.w.john...@saic.com>
Subject RE: Indexing terms limit
Date Wed, 10 Aug 2005 22:33:21 GMT
I posted before with issues about searching multiple indexes to get a
total number of docs that match a query with a given category.  The best
performance I found was to have a separate index for each index and
iterate over each category and do a hits.length() to get the total hits.


Well each query had to iterate over about 500 categories and as you can
assume, not every search has hits within every category.  So some of the
iteration can be reduced by determining which categories have terms that
match a search and only iterate over those indexes to get a total hit
count.  

I've indexed the list of terms within each index and search that index
first to get a distinct list of categories where the terms match and
thus reducing (in theory) the number on indexes needed to iterate.

If you have any suggestions on getting at the total hits faster, please
feel free to offer and suggestions.

Thanks
Tim

-----Original Message-----
From: hossman@hal.rescomp.berkeley.edu
[mailto:hossman@hal.rescomp.berkeley.edu] On Behalf Of Chris Hostetter
Sent: Wednesday, August 10, 2005 4:41 PM
To: java-user@lucene.apache.org
Subject: Re: Indexing terms limit


: I'm currently attempting to index the distinct list of terms found in
a
: Lucene index using the TermEnum.  I'm creating a document with each
list
: and indexing the document of terms.  It appears there's a limit of
: 10,000 distinct terms within a given document.  Can this be overcome??

Out of curiousity: why are you doing this?



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message