lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: querying multi-value fields
Date Mon, 12 Oct 2009 20:47:40 GMT
<<<I think Lucene sees all these values as one long
value for the field "option">>>
Not quite. Starting with the second add, a call will be made to
getPositionIncrementGap in your analyzer. If you return a number
larger than one, then the offsets between the last term of the preceeding
add and the first term of this add will be that number. If you do nothing
with getPositionIncrementGap, then Lucene does, indeed, see all
the terms as one long value since it returns 1.....

Here's an illusrtation in which PositionIncrementGap returns
an offset of 10 using your example adds.

Note, these numbers may be off by the infamous 1 or even 2.

term      offset
value1     0
aaa         1
value2     11
bbb         12
value3     22
ccc        23


So, you really don't care about the slop, since you can set it to less than
the
magic number you return from PositionIncrementGap. BTW, slop indicates
holes, not total terms. So with a slop of 0 all the words need to be next to
each other, regardless of whether there are two words or 20. But you still
have to do the trick with getPositionIncrementGap in order to fail to match
on something like "bbb value3", where the last term is next to the frist
term
of the next token......

HTH
Erick



On Mon, Oct 12, 2009 at 4:31 PM, Angel, Eric <eangel@business.com> wrote:

> I need to analyze these values since I also want the benefits
> porterStemmer.  The problem with using PhraseQuery is that I don't
> always know the slop.  I may have values like "value4 ddd aaa".  It's a
> tricky problem because I think Lucene sees all these values as one long
> value for the field "option".
>
> -----Original Message-----
> From: Jake Mannix [mailto:jake.mannix@gmail.com]
> Sent: Monday, October 12, 2009 1:25 PM
> To: java-user@lucene.apache.org
> Subject: Re: querying multi-value fields
>
> Or else just make sure that you use PhraseQuery to hit this field when
> you
> want "value1 aaa".  If you don't tokenize these pairs, then you will
> have to
>
> do prefix/wildcard matching to hit just "value1" by itself (if this is
> allowed
> by your business logic).
>
>  -jake
>
> On Mon, Oct 12, 2009 at 1:21 PM, Adriano Crestani
> <adrianocrestani@gmail.com
> > wrote:
>
> > Hi Eric,
> >
> > To achieve what you want, do not tokenize the values you query/add to
> this
> > field.
> >
> > On Mon, Oct 12, 2009 at 4:05 PM, Angel, Eric <eangel@business.com>
> wrote:
> >
> > > I have documents that store multiple values in some fields (using
> the
> > > document.add(new Field()) with the same field name).  Here's what a
> > > typical document looks like:
> > >
> > >
> > >
> > > doc.option="value1 aaa"
> > >
> > > doc.option="value2 bbb"
> > >
> > > doc.option="value3 ccc"
> > >
> > >
> > >
> > > I want my queries to only match individual values, for example, a
> query
> > > for "value2 bbb" would return the above document, but a query for
> > > "value1 ccc" should not.  Is this at all possible in lucene at query
> > > time?  Could payloads be used for this?
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message