lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Get all terms of a specific field
Date Tue, 27 Jul 2010 15:17:32 GMT


On Jul 27, 2010, at 8:50 AM, Philippe wrote:

> Hi,
> 
> what would be the fastest way to get all terms for all documents matching a specific
query?
> 
> Sofar I:
> 
> 1.) Query the index
> 2.) Retrieve all scoreDocs
> 3.) Iterate the scoreDocs and retrieve all terms using the getValues method and a customised
"FieldSelector"
> 
> However, retrieving and iterating the scoredocs is quite costly.  So is there a better/faster
way to perform this?


If you can afford to store TermVectors (disk is cheap, right?) then it will give you back
the terms post analysis and you won't have to split again, which you would have to do if you
use the getValues() approach.  You might also hook into the Collector (HitCollector) and build
it as you go, assuming you don't need the score docs structure.

-Grant



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message