lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cameron Leach <cameron.develo...@gmail.com>
Subject Re: Use of PrefixQuery to create multi-word queries
Date Wed, 05 Jan 2011 21:49:27 GMT
L -

I faced the exact same problem you're having. I ended up writing a custom
Analyzer to tokenize the terms the way I wanted. If memory serves me
correctly, the WhitespaceAnalyzer will do this:

"the brown dog" ->
the
brown
dog

I think what you want is for something like this:

"the brown dog" ->
the brown dog
brown dog
dog

If you write your custom analyzer accordingly, to trim terms from the
beginning and then use the NGramTokenFilter, you should get your real-time
search results back the way you expect. A small caveat is that spans won't
work here (e.g. 'the do' won't match 'the brown dog'), which might be what
you want. I wasn't ever able to figure out a way to do this with
WhitespaceAnalyzer and a tricky query.

Hope that helps a little.


On Wed, Jan 5, 2011 at 12:07 PM, L Duperval <duperval@videotron.com> wrote:

> Philip Puffinburger <ppuffinburger <at> tlcdelivers.com> writes:
> > We only do the PrefixQuery which is against the keyword field ("brown
> dog"
> > is a single term as is "the brown dog").   We don't have a BooleanQuery
> > like you do, but I don't see why it wouldn't work.
> >
>
> Ahh. OK, so you probably aren't using a whitespace analyzer like we are. We
> chose whitespace because we wanted to be able to search for multiple words,
> no
> matter where they occurred in the text. That way, we could (wanted to?)
> match
> "brown dog" with "the brown dog" or "the horse has a brown dog". We had
> thought
> of breaking up our date in multiple pieces like you are doing but were
> worried
> about memory and performance (we're storing the index in RAM). I think
> about
> this.
>
> Thanks for all the information. I'll do some testing on my end to see if I
> can
> do better than what I've got. I'll also have to possibly rethink some of
> our
> features (i.e. matching from the start of the title instead of the matching
> anywhere as we are currently doing).
>
> Thanks for your generosity,
>
> L
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message