lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen Densmore <>
Subject Multiple Keywords/Keyphrases fields
Date Sat, 12 Feb 2005 20:08:21 GMT
I'm getting a bit more serious about the final form of our lucene 
index.  Each document has DocNumber, Authors, Title, Abstract, and 
Keywords.  By Keywords, I mean a comma separated list, each entry 
having possibly many terms in a phrase like:
	temporal infomax, finite state automata, Markov chains,
	conditional entropy, neural information processing

I presume I should be using a field "Keywords" which have many 
"entries" or "instances" per document (one per comma separated phrase). 
  But I'm not sure the right way to handle all this.  My assumption is 
that I should analyze them individually, just as we do for free text 
(the Abstract, for example), thus in the example above having 5 entries 
of the nature
	doc.add(Field.Text("Keywords", "finite state automata"));
etc, analyzing them because these are author-supplied strings with no 
canonical form.

For guidance, I looked in the archive and found the attached email, but 
I didn't see the answer.  (I'm not concerned about the dups, I presume 
that is equivalent to a boos of some sort) Does this seem right?

Thanks once again.


> From: <>
> Subject: Multiple equal Fields?
> Date: Tue, 17 Feb 2004 12:47:58 +0100
> Hi!
> What happens if I do this:
> doc.add(Field.Text("foo", "bar"));
> doc.add(Field.Text("foo", "blah"));
> Is there a field "foo" with value "blah" or are there two "foo"s 
> (actually not
> possible) or is there one "foo" with the values "bar" and "blah"?
> And what does happen in this case:
> doc.add(Field.Text("foo", "bar"));
> doc.add(Field.Text("foo", "bar"));
> doc.add(Field.Text("foo", "bar"));
> Does lucene store this only once?
> Timo

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message