lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Analyzer.getPositionIncrementGap question
Date Thu, 26 Oct 2006 17:02:57 GMT
OK, how about injecting a special token in your input stream, then having
your analyzer record that token and set the position increment of the next
"real" token? Something like

d.add("field", "veryspecialtokenincrementnext100 value1"...);
d.add("field", "veryspecialtokenincrementnext10 value2"...);
d.add("field", "veryspecialtokenincrementnext1000 value3...");

then in your synonym-like analyzer, just do whatever your
veryspecialtokenincrmentnext### indicates? In this case, discard the special
token and set the positionincrement of the next token as indicated?

or inject VerySpecialIncrementNextBasedOnValue and have your analyzer figure
out what the correct increment is for the next token it sees...

If you do elect to include numbers in your veryspecialincre.... make sure
your analyzer pases them through and doesn't strip them.

You could also include the VerySpecial..... in multiple places within
value1, say.


On 10/26/06, qaz zaq <fortques@yahoo.com> wrote:
>
>   Thanks Erick,
>   Since value1, value2, value3, itself can also include multiply tokens, I
> am not sure  the token based postion increment, d.add(new Field("value1
> value2 value3"),  will actually work. I was trying to use
>   getPositionIncrementGap, but there appears to no way to set non constant
> gap. say, gap between "value1" and "value2" is 10, but gap between "value2"
> and "value3" is 100.
>
> Erick Erickson <erickerickson@gmail.com> wrote:
>   See the SynonymFilter in LIA for how to create your very own analyzer
> that
> gives you total control over the increment between terms. Essentially,
> that
> allows you to set the position increment for each and every token. I
> suspect
> that this would be easier, but what do I know?
>
> The difference is that you can set the position increment for each token,
> there's no reason to add the field multiple times. That is, for the call
>
> d.add(new Field("value1 value2 value3")
>
> and have the opportunity to set the position increment for all three
> terms.
> Of course, you could make three calls to d.add as well and this technique
> should still work.
>
> But if you really need the gap instead, you can simply override the
> getPositionIncrementGap method. But I don't know how you get the current
> token when it's called.
>
> One warning if you really *do* need getPositionIncrementGap. There's a bug
> in 2.0 that if you use a PerFieldAnalyzerWrapper, getPostitionIncrementGap
> isn't called properly. There's a fix in the source and the nightly build.
> This doesn't (as far as I know) apply to the SynonymFilter strategy.
>
> Hope this helps
> Erick
>
> On 10/25/06, qaz zaq wrote:
> >
> > I have multiple values want to add to the same FIELD, and I also want to
> > add non-zero but NON CONSTANT position increment gap among those values.
> > e.g., gap between "value1" and "value2" is 10, but gap between "value2"
> > and "value3" is 100. is there any how can I achieve that?
> >
> > d.add(new Field ("fld", "value1");
> > d.add(new Field ("fld", "value2");
> > d.add(new Field ("fld", "value3");
> > indexWriter.addDocument(d);
> >
> >
> > ---------------------------------
> > All-new Yahoo! Mail - Fire up a more powerful email and get things done
> > faster.
> >
>
>
> ---------------------------------
>   All-new Yahoo! Mail - Fire up a more powerful email and get things done
> faster.
>
>
> ---------------------------------
> How low will we go? Check out Yahoo! Messenger's low  PC-to-Phone call
> rates.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message