lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nadav Har'El" <...@il.ibm.com>
Subject Re: Lucene Seaches VS. Relational database Queries
Date Wed, 12 Apr 2006 07:06:39 GMT
Chris Hostetter <hossman_lucene@fucit.org> wrote on 12/04/2006 01:41:37 AM:
> : them in one field).  One of the problems I see would be with values
that
> : over lap (Example, name where one name is Jason Bateman, and one is
Jason
> : Bateman Black, and it would be hard to replicate the Discrete Search
for
>
> they way field values are "analyzed" is extremely configurable -- down to
> the individual field level.  Which means that while you can have an actor
> field where you can do loose text searching for "bateman" and get back
> movies staring "Jason Bateman" and "Jason Bateman Black" (and even Guido
> Batemans" if you use stemming) you can also have another field using a
> KeywordAnalyzer such that a record with teh values "Jason Bateman" and
> "Jack Black" will only be matched if hte user searches for "Jason
Bateman"
> or "Jack Black" ... searching for "Bateman Jack" or "Black Jason" will
not
> work.

Another possible trick is to have one field, but mark its end with special
tokens, say "^" and "$", so that "Jason Bateman" gets indexed as four
tokens:
      ^ Jason Bateman $
Then, if you want to search for the name Jason Bateman and that name only,
just search for the phrase "^ Jason Bateman $" - and only this entry will
match. (you can also continue to search this field normally)

If you'll think about this, you'll notice that you don't actually need
the beginning-of-field marker ("^") because it's easy to recognize the
beginning of a field because the position there is 0. Unfortunately,
I don't know how to match position 0 using the standard QueryParser,
but you can do it with the SpanFirstQuery: for example if we index
Jason Bateman as the three tokens
      Jason Bateman $
then we can search for it using something like
      SpanQuery[] terms = {
            new SpanTermQuery(new Term("actor", "Jason")),
            new SpanTermQuery(new Term("actor", "Bateman")),
            new SpanTermQuery(new Term("actor", "$")) };
      new SpanFirstQuery(new SpanNearQuery(terms, 0, true), 3);
(or something like that... I didn't test this)


--
Nadav Har'El


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message