lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Moray McConnachie <mmcco...@oxford-analytica.com>
Subject Indexing multiple instances of the same field for each document
Date Fri, 27 Feb 2004 10:16:37 GMT
I note from previous entries on the mailing list and my own experiments that
you can add many entries to the same field for each document. Example: a
given document belongs to more than one product, ergo I index the product
field with values "PROD_A" and "PROD_B".

If I don't tokenise the fields when adding them to the document, then when
storing the values and printing them out before adding them to the index, so
I can see what the index is recording, I do indeed get

Keyword<product:PROD_A> Keyword <product:PROD_B>

However, a query on product:PROD_A returns no results, neither does a query
on product:PROD_B.

If I tokenize the fields (i.e. the document content reads
Text<product:PROD_A> Text<product:PROD_B), then it works correctly.

[n.b. I am using the .NET implementation of Lucene, but its behaviour is
said to be identical to the Java Lucene.]

1) Is this expected behaviour? 

If so, are multiple fields of the same name to a document silently converted
to a string/array representation of some kind?

2) Is it sensible behaviour?

I ask because it seems to me contrary to instinct, and also because my guess
would be that a Keyword index would be faster to add (and faster to query?)
than a Text index.

Yours,
Moray McConnachie
------------------------------------
Moray McConnachie, IT Manager
Oxford Analytica http://www.oxan.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message