lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Paul Sondag" <>
Subject Re: Does Index have a Tokenizer Built into it
Date Wed, 18 Jul 2007 16:14:19 GMT
Is there a way to know how big to make the array before hand (how many terms
are in the topic total?).  I'm worried about the efficiency of this, since
I'd have to rebuild every document that is a "hit" on the fly to make a
snippet for each "hit" on the page (say 10 a page).

Now I have to wonder how storing the termPosition vectors in the index +
sorting them by position  compares to storing the location of the document +
using a tokenizer on the document.  Both in the end give me the result I

Any opinions?


On 7/18/07, Chris Hostetter <> wrote:
> : After indexing I have been able to retrieve the TermPositionVector from
> the
> : index and it has all of the data, but I cannot find a way where given a
> : position I can retrieve the term at that position. Which is how I was
> hoping
> : to create my contextual snippets.
> there is no easy way to go from a position to a term -- coincidently there
> is a very recent thread on this on java-dev...
> ...a new API may come out of it, but in the mean time you may be
> interested in taking the approach the current highlighter uses (as
> mentioned in that thread), of using the TermPositionVector to rebuild the
> orriginal tokenstream, then skipping ahead to the positions you are
> interested in.
> -Hoss
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message