lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From BorisCallens <boris.call...@gmail.com>
Subject get distinct values of one field from query
Date Tue, 17 Mar 2009 12:47:46 GMT

In my project I have a query that can possibly return several millions of
documents.
>From these documents I always want the unique values from a certain field.
For the sake of clarity we can take for example the "id" field.

Currently I'm pulling out all the values for the id field, distincting them
in my application (c# in my case, but could be any language off course) and
then returning these values.
In scenarios where the query only returns several hundreds of rows, this
works fast enough. But pulling out several million values and distincting
them can take quite some time.

Is there a more performant way to do this?

--Example code (C#)
    hits = searcher.Search(query);
    List<string> idStrings = new List<string>();
    int count = hits.Length();
    for (int i = 0; i < count; i++)
    {
        idStrings.Add(hits.Doc(i).Get("id"));
    }
    idStrings = idStrings.Distinct<string>().ToList();
-- 
View this message in context: http://www.nabble.com/get-distinct-values-of-one-field-from-query-tp22558451p22558451.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Mime
View raw message