lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Beady Geraghty <>
Subject Re: standardTokenizer - how to terminate at End of Stream
Date Thu, 22 Sep 2005 00:55:53 GMT
Thank you for the response.
 I was trying to do something really simple - I want to extract the context
terms and phrases from files that satisfy some (many) queries.
I *know* that file test.txt is a hit (because I queried the index, and
it tells me that test.txt satisfies the query). Then, I open the file, and
use Lucene's
standardTokenizer to tokenize the input. I get a token at a time
to see which token or consecutive tokens match the terms/phrases.
Then I extract the context surrounding these terms.
 I didn't try the highlighter because I don't really need to "highlight",
and I didn't
look clearly whether some of the classes provided in the package would
already do
what I need. (Although, I would imagine this is something many people would
have done what I try to do already. It appears to have a fragmenter, and I
know if that is something I need.)
 Since I used the StandAnalyzer when I originally created the index,
I therefore use the StandardTokenizer to tokenize the input stream.
 Is there a better way to do what I try to do ?
  From your comment below, it appears that I should just use next() instead
getNextToken(), is that correct ?

 On 9/21/05, Erik Hatcher <> wrote:
> Could you elaborate on what you're trying to do, please?
> Using StandardTokenizer in this low-level fashion is practically
> unheard of, so I think knowing what you're attempting to do will help
> us help you :)
> Erik

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message