lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: multiple keyword fields vs. multiple-token field
Date Tue, 28 Nov 2006 21:39:43 GMT

On Nov 28, 2006, at 4:31 PM, Michael Rusch wrote:

> I have documents that can be referred to by multiple identifiers  
> (and I want
> to store the identifiers separate from the main indexed content).  I'm
> wondering if I should put each identifier in it's own keyword  
> field, or have
> one tokenized field with all of the identifiers in it.  What I'm  
> talking
> about is something like this:
> "Identifier" is a keyword field
> Add field Identifier="ABCD"
> Add field Identifier="WXYZ"
> Or
> Identifiers is a tokenized, indexed, unstored field
> Add field Identifiers="ABCD WXYZ"
> It would seem that either would work, but I was wondering if there  
> was a
> "standard" way or if anybody had thoughts on relative advantages or
> disadvantages (or is it half of one/six dozen of the other and I  
> should just
> pick one and go with it).

There really is no difference in these approaches for identifiers.   
Different field instances have a positional gap feature, but that  
wouldn't be a consideration for TermQuery's on these identifiers.  I  
recommend the multiple keyword field approach, to avoid having to  
deal with analysis (in case identifiers have special characters, etc).

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message