lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Design questions
Date Fri, 15 Feb 2008 15:57:40 GMT
You need to watch both the positionincrementgap
(which, as I remember, gets added for each new field of the
same name you add to the document). Make it 0 rather than
whatever it is currently. You may have to create a new analyzer
by subclassing your favorite analyzer and overriding the
getPositionIncrementGap (?)

Also, I'm not sure whether the term increment (see get/setPositionIncrement)
needs to be taken into account. See the SynonymAnalyzer in
Lucene in Action.

On Fri, Feb 15, 2008 at 8:37 AM, <spring@gmx.eu> wrote:

> > >   Document doc = new Document()
> > >   for (int i = 0; i < pages.length; i++) {
> > >     doc.add(new Field("text", pages[i], Field.Store.NO,
> > > Field.Index.TOKENIZED));
> > >     doc.add(new Field("text", "$$", Field.Store.NO,
> > > Field.Index.UN_TOKENIZED));
> > >   }
> >
> > UN_TOKENIZED. Nice idea!
> > I will check this out.
>
>
> Hm... when I try this, something strange happens with my offsets.
>
> When I use
> doc.add(new Field("text", pages[i] +
> "012345678901234567890123456789012345678901234567890123456789",
> Field.Store.NO, Field.Index.TOKENIZED))
> everything is fine. Offsets are as I expect.
>
> But when I use
> doc.add(new Field("text", pages[i], Field.Store.NO, Field.Index.TOKENIZED
> ))
> doc.add(new Field("text",
> "012345678901234567890123456789012345678901234567890123456789",
> Field.Store.NO, Field.Index.UN_TOKENIZED))
>
> the offsets of my terms are to high.
>
> What is the difference?
>
> Thank you.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message