lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Paul Sondag" <jsond...@uiuc.edu>
Subject Re: Retrieve nearest token based off location in original Text
Date Fri, 06 Jul 2007 17:50:38 GMT
I thought that went to the "index" of the token.  I may not understand it
completely but this is how I currently view the TokenStream

For example if my text was the following:

This is an Example

This is index of 1, is has index 2, an has index 3 Example has index 4.
What I have is the actual "character position" in the original text.  "This"
is characters 0-3, "is" is characters 5-6, "an" is characters 8-9, and
"Example" is characters 11-17.  I know that given Token 4 (Example) I can
get the startOffset and endOffset (11, and 17).  What I'm wondering is given
character offset can I get a tokenIndex.  (I.E.  given character offset 12,
it would return 3, because Example is the closest token that starts at
character 12).

--JP

On 7/6/07, Chris Hostetter <hossman_lucene@fucit.org> wrote:
>
> : I never got a response to this and thought maybe I was too wordy.
> :
> : I'm wondering if there's a way where given a position in the original
> text
> : you can retrieve the token index that is nearest to that position using
> the
> : StandardToken/StandardTokenizer classes?
>
> i may not be understanding the question, but wouldn't that just be...
>
>   TokenStream s = getTokenStreamForOrriginalText()
>   Token t;
>   for (i=0; i<thePositionYouKnow; i++) {
>     t = s.next();
>   }
>   return t;
>
> ?
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message