lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramana Jelda" <ramana.je...@ciao-group.com>
Subject RE: HitCollector and Sort Objects
Date Fri, 30 Jun 2006 08:00:46 GMT
Yeah!! There are no methods that you mentioned. But there are some ways to
do this.

TopFieldDocs:search(Query query, Filter filter, int n, Sort sort) 

If above method does not solve your purpose, then

My suggestion is to use method search(Query query, Filter filter,
HitCollector results)  and pass your HitCollector which extends
TopFieldDocCollector .

Jelda


> -----Original Message-----
> From: James Pine [mailto:general_nogi@yahoo.com] 
> Sent: Friday, June 30, 2006 1:54 AM
> To: java-user@lucene.apache.org
> Subject: HitCollector and Sort Objects
> 
> Hey,
> 
> I've looked at the documentation for:
> 
> org.apache.lucene.search.Searchable
> org.apache.lucene.search.Searcher
> org.apache.lucene.search.IndexSearcher
> 
> and it struck me that there are no search methods with these 
> signatures:
> 
> void search(Query query, Filter filter, HitCollector results, 
> Sort sort)
> 
> void search(Query query, HitCollector results, Sort
> sort)
> 
> I have one type of search where I pass in a Query and a Sort 
> (built with a SortField and Decompresses) and deal with the 
> Hits object, and another which takes a Query and a 
> HitCollector, which I then run my own sorting on the results.
> 
> I'd like to combine both into one call so that I can get the 
> results sorted and have the collected data available. Is that 
> crazy talk? Does the way document ids stream through the 
> HitCollector prevent sorting them first?
> 
> Would it be best for me to convert my Decompresses logic into 
> a Comparator to pass into Arrays.sort() or something like 
> that and just forget about using the built in Lucene functionality?
> 
> I currently use the HitCollector to count occurrences of 
> words in the result set, basically trying compute the 
> TermFrequencies in the result of a search vs.
> across the entire index. I've tried using bitsets but because 
> I can't constrain the number of unique words people use 
> (currently in the millions in our data), and the number of 
> document (also currently in the millions), keeping a cache of 
> bitsets is not feasible on our hardware (I can save about 
> 10,000 but the cache hit ratio is very low). If someone 
> thinks my use of HitCollector is flawed, I'm certainly open 
> to suggestions. Thanx.
> 
> JAMES
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection 
> around http://mail.yahoo.com 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message