lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Best approach for exact Prefix Field Query
Date Tue, 14 Nov 2006 13:50:53 GMT
What Erik said <G>...

But I thought I'd add that I was pleasantly surprised by how very fast the
regex contribution went when creating a filter. And you can cache the
filters. Don't be afraid <G>.

But in this case I don't think that would help either. Your basic problem is
probably that you're indexing discrete elements because the analyzer breaks
them up. i.e. "Lucene in action" gets indexed as three tokens, then you have
a hard time searching them other than as individual tokens. Regex wouldn't
help you here, since there's really no concept of a "line" after things are
tokenized (except that you can do interesting things with the offsets of the
tokens in the field, which is what I believe SpanFirst is doing for you).

But don't go there, use Erik's suggestion instead. And if that doesn't do
exactly what you want, consider indexing a separate field that only
contains, say, the first word and doing your "prefix field search" on that
field. You wouldn't even have to store that field......

Best
Erick@ICanTypeMoreThanErikAndBeLessHelp.....

On 11/14/06, Martin Braun <mbraun@uni-hd.de> wrote:
>
> hi,
>
> i would like to provide a exact "PrefixField Search", i.e. a search for
> exactly the first words in a field.
> I think I can't use a PrefixQuery because it would find also substrings
> inside the field, e.g.
> action* would find titles like "Action and knowledge" but also (that's
> what i don't want it to find)
> "Lucene in Action"
>
> As a regex it would be sth. like /^Action and.*/
>
> Now the question for me is how to implement this functionality, I see to
> ways:
>
> 1) Some kind of TermEnum over all Docs (or the prefixquery results?) and
> string comparison
> 2) Using the regex contribution
> 3) a super -fast lucene function I have overseen :)
>
> with 2) I am worrying about performance, anybody have experiences with
> regex-queries?
>
> .. but same for 1) anybody already impolemented this already and could
> give some code samples / hints ?
>
> tia,
>
>
> martin
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message