lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ananth T. Sarathy" <ananth.t.sara...@gmail.com>
Subject Re: Lucene Seaches VS. Relational database Queries
Date Thu, 13 Apr 2006 15:33:44 GMT
Ok,
 Some of the stuff makes  some sense. I was a little loopy from lack of
sleep and some of these solutions don't really cover my concerns....


Let's take this movie example. If each member of a production Crew can have
multiple titles that come from a lookup table of Distinct Jobs

Titles
Assistant Producer
Producer
Executive Producer
Director
Director Trainee
Stunt Director

In the Database there would be a Assocation Table Linking each Crew member
the titles they had

Crew_Titles
Crew_ID   Title
1             Producer
1

On 4/12/06, Nadav Har'El <NYH@il.ibm.com> wrote:
>
> Chris Hostetter <hossman_lucene@fucit.org> wrote on 12/04/2006 01:41:37
> AM:
> > : them in one field).  One of the problems I see would be with values
> that
> > : over lap (Example, name where one name is Jason Bateman, and one is
> Jason
> > : Bateman Black, and it would be hard to replicate the Discrete Search
> for
> >
> > they way field values are "analyzed" is extremely configurable -- down
> to
> > the individual field level.  Which means that while you can have an
> actor
> > field where you can do loose text searching for "bateman" and get back
> > movies staring "Jason Bateman" and "Jason Bateman Black" (and even Guido
> > Batemans" if you use stemming) you can also have another field using a
> > KeywordAnalyzer such that a record with teh values "Jason Bateman" and
> > "Jack Black" will only be matched if hte user searches for "Jason
> Bateman"
> > or "Jack Black" ... searching for "Bateman Jack" or "Black Jason" will
> not
> > work.
>
> Another possible trick is to have one field, but mark its end with special
> tokens, say "^" and "$", so that "Jason Bateman" gets indexed as four
> tokens:
>      ^ Jason Bateman $
> Then, if you want to search for the name Jason Bateman and that name only,
> just search for the phrase "^ Jason Bateman $" - and only this entry will
> match. (you can also continue to search this field normally)
>
> If you'll think about this, you'll notice that you don't actually need
> the beginning-of-field marker ("^") because it's easy to recognize the
> beginning of a field because the position there is 0. Unfortunately,
> I don't know how to match position 0 using the standard QueryParser,
> but you can do it with the SpanFirstQuery: for example if we index
> Jason Bateman as the three tokens
>      Jason Bateman $
> then we can search for it using something like
>      SpanQuery[] terms = {
>            new SpanTermQuery(new Term("actor", "Jason")),
>            new SpanTermQuery(new Term("actor", "Bateman")),
>            new SpanTermQuery(new Term("actor", "$")) };
>      new SpanFirstQuery(new SpanNearQuery(terms, 0, true), 3);
> (or something like that... I didn't test this)
>
>
> --
> Nadav Har'El
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


--
Ananth T Sarathy

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message