lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Positions in SpanFirst
Date Thu, 22 Feb 2007 00:40:38 GMT
I really think you need to stop obsessing on SpanFirst <G>. I suspect that
this is leading you down an unrewarding path.

So I don't see why using a SpanNear that respects order and a large
IncrementGap won't solve your problem...... Although it would return "odd"
matches. Let's say you indexed "first second third" as one name and then
searched on a SpanNear of second and third with a slop of 100. You'd get a
match on a middle and last name rather than a first and last name..... But I
wonder if this can be tolerated given all the new capabilities you'll
doubtless be adding <G>.

Best
Erick

On 2/21/07, Antony Bowesman <adb@teamware.com> wrote:
>
> Hi Erick,
>
> > What this does is allow you to put gaps between successive sets of terms
> > indexed in the same field. For instance...
> > doc.add("field", "some stuff");
> > doc.add("field", "bunch hooey");
> > doc.add("field", "what is this");
> > writer.add(doc);
> >
> > In this case, there would be the following positions, assuming that the
> > IncrementGap was 1000....
> > some 0
> > stuff 1
> > bunch 1002
> > hooey 1003
> > what 2004
> > is 2005
> > this 2006
>
> So, if you can add 1000, shouldn't setting 0 each time cause it to start
> at 0
> each time?  The default Analyzer.getPositionIncrementGap always returns 0.
>
> >> That's a good point.  The field is used to index mail recipients and
> >> currently
> >> has a "starts with" search (non Lucene implementation).  Unless I can
> set
> >> the
> >> position increment gap, it is only ever possible to search for the
> first
> >> indexed
> >> recipient with proxity queries.\
> >
> >
> > This is confusing me. You can easily use proximity queries with the
> above
> > scenario. For instance, searching for "bunch hooey"~4 would generate a
> hit.
> > As would "bunch hooey"~10000. But "some this"~10 would not generate a
> hit.
> > Whether that does what you need is another question <G>... So it's time
> to
> > ask "what are you really trying to do?" In other words, what behavior
> are
> > you trying to mimic from the old code? It's not clear to me what the
> > behavior you need is. It'd help if you gave a concrete example of the
> raw
> > data, and what you want returned...
>
> You example is good enough, just assume they are people's names :)  I know
> I had
> a mail from Mrs Bunch Ogilvy, so I want to do a "starts with", i.e.
> SpanFirst
> for bunch, so I find all the first name bunches.
>
> > In your first example, using the above scheme, you'd get hits (using
> > SpanNear rather than SpanFirst) if you searched on
> > "first bit" in a SpanNear query with a slop of 2. You'd also get a hit
> if
> > you searched on
> > "second part" in a SpanNear with a slop of 2. Does this mimic the
> behavior
> > you need?
>
> No, SpanNear is fine, but SpanFirst will not work as there always has to
> be a
> starting offset.  I can't search "bunch hooey" as SpanFirst unless I know
> that
> it was indexed as the second 'group' and therefore set the starting span
> position as 1002.
>
> Using Lucene has added a whole world of new search possibilities to the
> product,
> but when people have been using something a certain way for 15 years, it
> can be
> difficult to shift their expectations :)  There's always someone who will
> shout...
>
> Antony
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message