lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brisbart Franck <>
Subject Re: how to get an extra count
Date Tue, 08 Apr 2003 08:42:43 GMT
Ype Kingma wrote:
> Franck,
> On Monday 07 April 2003 05:59, Brisbart Franck wrote:
>>Ype Kingma wrote:
>>>On Tuesday 04 February 2003 09:12, you wrote:
> <flashing back>
>>>>Hi all,
>>>>I'm trying to gather information about my non-searched (ie not used for
>>>>the search) fields.
>>>>Let's take an index with 2 fields: 'artist' (for the artist name) an
>>>>'type' (for his type of music).
>>>>I need to perform a search on the 'artist' field to retrieve a list of
>>>>artists matching the query. But I also need to have the number of
>>>>artists per 'type'.
>>>>My first solution was to write a HitCollector which do the work:
>>>>    public void collect(int doc, float score) {
>>>>    Document document = indexReader_.document(doc);
>>>>    if(document.get("type").equals("Rock"))
>>>>        nbRock++;
>>>>    ...
>>>>    }
>>>>But, as I first get the document to analyze my non-searched field, the
>>>>treatment can be very long for searches with lots of results.
>>>>Is there any better(=faster) method to have this extra info ??
>>>It's best not to retrieve document fields during search.
>>>You can do this after collecting, possibly in the order
>>>that you collected the documents.
>>You're right. Using document fields during a search wasn't a good idea.
>>In fact, I would like to avoid using document fields at all. Because, it
>>takes too much time. If a search returns 10000 results, I need to
>>analyse all of them to get my extra info (nb of artist per type). And
>>even if I have only 10 results to display, my search is very slow.
>>It'll be great to treat my 'type' field the same way as the 'freq' is
>>used. I think this will be much faster.
>>How can I store the content of this field in a lucene file (such as
>><segment>.frq) to use it during my search ??
> Couldn't you extend your query with
>  +type:rock
> and use the high level search API and ask for the size of the results?
> You can do that when you index the 'type' field.
If I do that, I should run 1 query for each 'type'. And, as I have 
something like 2000 types, I don't think it is better to have 2000 fast 
requests than 1 big request.
I could do 1 query for each type found by the first query. But it only 
shifts the problem, because I should first retrieve all these types, and 
to do that I have to analyse all my results.
My real problem is to find the best way to use extra data (not search 
fields) to compute some meta data on a given query.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message