lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dror Matalon <d...@zapatec.com>
Subject Re: How to get list of unique field values for a subset of documents
Date Fri, 12 Dec 2003 18:17:47 GMT
Some details about number of documents, size of documents, hardware
used and what you consider fast would help answering, but my gut
feeling is that this problem would work faster on a database
than using full text search.

On Fri, Dec 12, 2003 at 09:45:53AM -0400, Karl Penney wrote:
> I'm looking for a fast way (execution wise) to get a list of unique values for a field
called "partno" for all documents which have a given value for a field called "type".  This
is for adding values to a drop-down list.
> 
> What I have done so far is to build a list of document numbers for each value of the
"type" field (using TermEnum and TermDocs).  I then use TermEnum to get a list of values for
the "partno" field.  This returns values for all of the documents, so for each "partno" value
I use TermDocs to get a list of document numbers for the "partno" and then cross check this
list against the document list for "type".  If at least one of the "partno" documents is in
the "type" list then I know that the "partno" value is valid for that "type" value.
> 
> This works ok, but is not as fast as I would like it to be.  Any ideas on how to improve
it?
> 
> Here is some sample code:
> 
> // build list of document numbers for each value of the type field
> TermEnum te = reader.terms(new Term("type", ""));
> typeMap = new HashMap();
> while (te.term() != null && "type".equals(te.term().field())) {
>   String type = te.term().text();
>   TermDocs docs = reader.termDocs(new Term("type", type));
>   ArrayList typelist = new ArrayList();
>   while (docs.next()) {
>     String docnum = String.valueOf(docs.doc());
>     typelist.add(docnum);
>   }
>   typeMap.put(type, doclist);
>   te.next(); 
> }
> 
> // add partno values to dropdown for type-1
> ArrayList doclist = (ArrayList)typeMap.get("type-1");
> if (doclist != null) {
>     // get unique for values for the partno field from the index
>     TermEnum te = reader.terms(new Term("partno", ""));
>     while (te.term() != null && "partno".equals(te.term().field())) {
>         String value = te.term().text();
>         // check that at least one document with the "partno" value is in the list for
type-1
>         TermDocs docs = reader.termDocs(new Term("partno", value));
>         while (docs.next()) {
>             String docnum = String.valueOf(docs.doc());
>             if (doclist.contains(docnum)) {
>                 // add value to drop down
>                 cbo.add(value.toUpperCase());
>                 break;
>             }
>         }
>         te.next();            
>     }
> }
> 
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message