lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lev Bronshtein <>
Subject Line filtering
Date Sun, 05 Sep 2010 21:09:21 GMT

Hello group, 

I am new to Lucene and ran into a bit of trouble while writing an app.  I would like to selectively
index lines from a syslog on a unix system, to this end I first wrote tokenizer that returns
an entire line as a token extending CharTokenizer

  protected boolean isTokenChar(char c) {
    return !((c == '\n') || (c == '\r'));

Perhaps that is my first mistake and I should have done things differently? 

I then pass this to a filter that only selects the lines with text I am interested in

 public final boolean incrementToken() throws IOException
  while (input.incrementToken())
   Matcher lineMatcher = linePattern.matcher(termAtt.term());
   if (lineMatcher.find()) //(we like the payload)
     return true;
  //reached EOS -- return false
  return false;

However the issue is that, now that I have the line I want to break up the individual line
into tokens along white space, but the WhitespaceTokenizer does not take a TokenStream as
a constructor parameter.  Can anyone offer  suggestion for a workaround?


Lev Bronshtein
To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message