lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lev Bronshtein <lev_bronsht...@hotmail.com>
Subject Line filtering
Date Sun, 05 Sep 2010 21:09:21 GMT

Hello group, 

I am new to Lucene and ran into a bit of trouble while writing an app.  I would like to selectively
index lines from a syslog on a unix system, to this end I first wrote tokenizer that returns
an entire line as a token extending CharTokenizer

  protected boolean isTokenChar(char c) {
    return !((c == '\n') || (c == '\r'));
  }

Perhaps that is my first mistake and I should have done things differently? 

I then pass this to a filter that only selects the lines with text I am interested in

 public final boolean incrementToken() throws IOException
 {
  while (input.incrementToken())
  {
   Matcher lineMatcher = linePattern.matcher(termAtt.term());
   if (lineMatcher.find()) //(we like the payload)
     return true;
  }
  //reached EOS -- return false
  return false;
 }

However the issue is that, now that I have the line I want to break up the individual line
into tokens along white space, but the WhitespaceTokenizer does not take a TokenStream as
a constructor parameter.  Can anyone offer  suggestion for a workaround?

Regards,

Lev Bronshtein
 		 	   		  
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message