lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Moray McConnachie <>
Subject Indexing multiple instances of the same field for each document
Date Fri, 27 Feb 2004 10:16:37 GMT
I note from previous entries on the mailing list and my own experiments that
you can add many entries to the same field for each document. Example: a
given document belongs to more than one product, ergo I index the product
field with values "PROD_A" and "PROD_B".

If I don't tokenise the fields when adding them to the document, then when
storing the values and printing them out before adding them to the index, so
I can see what the index is recording, I do indeed get

Keyword<product:PROD_A> Keyword <product:PROD_B>

However, a query on product:PROD_A returns no results, neither does a query
on product:PROD_B.

If I tokenize the fields (i.e. the document content reads
Text<product:PROD_A> Text<product:PROD_B), then it works correctly.

[n.b. I am using the .NET implementation of Lucene, but its behaviour is
said to be identical to the Java Lucene.]

1) Is this expected behaviour? 

If so, are multiple fields of the same name to a document silently converted
to a string/array representation of some kind?

2) Is it sensible behaviour?

I ask because it seems to me contrary to instinct, and also because my guess
would be that a Keyword index would be faster to add (and faster to query?)
than a Text index.

Moray McConnachie
Moray McConnachie, IT Manager
Oxford Analytica 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message