lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: Different boost values for different terms in a field.
Date Thu, 05 Oct 2006 19:43:13 GMT
Frode Bjerkholt <fb@mtouch.no> wrote on 05/10/2006 01:10:43:
> My intention is to give different terms in a field different boost
values.
> The queries from a use perspective, will be one fulltext input field.
> The following code illustrates this:
>
> Field f1 = new Field("name", "John", Field.Store.NO,
Field.Index.TOKENIZED);
> Field f2 = new Field("name", "Doe", Field.Store.NO,
Field.Index.TOKENIZED);
>
> f1.setBoost(1.0f);
> f2.setBoost(2.0f);
>
> doc.add(f1);
> doc.add(f2);
>
> In the current version of Lucene, as far as I now, this does not work -
> Allthough it would have been a very powerful feature.

To support this, additional info would need to be stored along with each
index token - i.e. along with each occurrence of each index term in each
indexed document. There are discussions on adding (in a future "flexible"
index structure) token "payloads". If/when this is added, and if this is
flexible and general as desired, such boost per token can be stored there
and then used at scoring. For more info on this search for "payloads" in
the dev mailing list.

Notice however that even so, without separating to distinct fields, when
searching for "Doe" - both its occurrences as "name" and as "last name"
would be collected, and there would be no way to look for only matches of
it as, say, "last name".

>
> The current solution is to make a firstname field and a lastname field,
and
> then make a complex query like this:
>
> Input: Eric Doe
>
> (firstname:Eric OR lastname:Eric^2) AND (firstname:Doe OR lastname:Doe^2)
>
> The performance of such a query is quite slow, and it becomes even worse
when
> you have more than two fields and/or more words in the input string.
>
> My questions:
>
> 1. Is there a better/faster solution to accomplish such a query?
>

I think one way (which I don't like but you may think otherwise) would be
to insert two tokens for a boosted one at indexing time, so that your
indexing code would look like:
  Field f1 = new Field("mixed", "John Doe", Store.NO, TOKENIZED);
  doc.add(f1);
  Field f2 = new Field("mixed", "Doe", Store.NO, TOKENIZED);
  doc.add(f2);
This would enlarge the index.
You might need to work the gap (between f1 and f2) to avoid false phrase
matches.
But your query should be simple and faster.

> Field f2 = new Field("name", "Doe", Field.Store.NO,
Field.Index.TOKENIZED);

> 2. Would it be possible to implement the described feature in a
> future version
> of Lucene?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message