lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Paul Sondag" <jsond...@uiuc.edu>
Subject Re: Retrieve nearest token based off location in original Text
Date Thu, 05 Jul 2007 21:27:37 GMT
Hi,

I never got a response to this and thought maybe I was too wordy.

I'm wondering if there's a way where given a position in the original text
you can retrieve the token index that is nearest to that position using the
StandardToken/StandardTokenizer classes?



--JP

On 7/3/07, John Paul Sondag <jsondag2@uiuc.edu> wrote:
>
> Hi,
>
> I was wondering if it's possible to get the token offset based of the
> position in the original text.
>
> My problem is I'm working on my own "Snippet Generator" and I'm giving a
> token index (call it t) as input and need to make a snippet of the original
> text.  I want the Snippet to be some number of tokens (call it n tokens).
> But to make the Snippet easier to read I want to see if it's close to the
> end of a paragraph (if it is I'll make more of the Snippet before the token
> than usual).  So I'm scanning the original text forward some number of
> characters looking for a new line or tab.  If I find it I'd like to get the
> token before that new line (and it's offset, call it y).  Once I have the
> offset I know I have y - t tokens after my token, and finally I know I put
> n-(y-t) tokens before my token and can successfully make my Snippet.
>
> Thanks in advance!
>
> --JP
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message