lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roy Klein" <kl...@sitescape.com>
Subject RE: Indexing multiple instances of the same field for each document
Date Fri, 27 Feb 2004 14:49:27 GMT
If one adds multiple terms to a single fieldname two ways:

    1) Adding a single field with all the content 
    2) Adding that same field several times, each time with a different
term.

E.g.
   doc1.add(Field.indexed("field","the");
   doc1.add(Field.indexed("field","quick");
   doc1.add(Field.indexed("field","brown");
   doc1.add(Field.indexed("field","fox");
   doc1.add(Field.indexed("field","jumped");
   writer.addDocument(doc1);
Vs.
   doc2.add(Field.indexed("field","the quick brown fox jumped");
   writer.addDocument(doc2);


Is there a difference in query performance when I query on fields that
have been added multiple times vs fields which were added with the
entire field contents at once?

Also FYI, I found that phrase queries don't work against a field that's
been added multiple times. If I query the phrase "brown fox", against
the two example docs above, only the second matches.


   Roy

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
Sent: Friday, February 27, 2004 5:29 AM
To: Lucene Users List
Subject: Re: Indexing multiple instances of the same field for each
document


On Feb 27, 2004, at 5:16 AM, Moray McConnachie wrote:
> I note from previous entries on the mailing list and my own
> experiments that
> you can add many entries to the same field for each document. Example:

> a
> given document belongs to more than one product, ergo I index the 
> product
> field with values "PROD_A" and "PROD_B".
>
> If I don't tokenise the fields when adding them to the document, then
> when
> storing the values and printing them out before adding them to the 
> index, so
> I can see what the index is recording, I do indeed get
>
> Keyword<product:PROD_A> Keyword <product:PROD_B>
>
> However, a query on product:PROD_A returns no results, neither does a
> query
> on product:PROD_B.

Are you using QueryParser?  Try using a TermQuery("product", "PROD_A") 
when indexing as a Keyword and see what you get.  If that finds it, 
then you are suffering from analysis paralysis.  QueryParser, Keyword 
fields, and analyzers are a very "interesting" combination.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message