lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Line filtering
Date Mon, 06 Sep 2010 08:38:05 GMT
You'd be better off reading and selecting the syslog lines outside of
lucene.  Then pass the lines you are interested in to lucene using
whatever analyzer you want.


--
Ian.


On Sun, Sep 5, 2010 at 10:09 PM, Lev Bronshtein
<lev_bronshtein@hotmail.com> wrote:
>
> Hello group,
>
> I am new to Lucene and ran into a bit of trouble while writing an app.  I would like
to selectively index lines from a syslog on a unix system, to this end I first wrote tokenizer
that returns an entire line as a token extending CharTokenizer
>
>   protected boolean isTokenChar(char c) {
>     return !((c == '\n') || (c == '\r'));
>   }
>
> Perhaps that is my first mistake and I should have done things differently?
>
> I then pass this to a filter that only selects the lines with text I am interested in
>
>  public final boolean incrementToken() throws IOException
>  {
>   while (input.incrementToken())
>   {
>    Matcher lineMatcher = linePattern.matcher(termAtt.term());
>    if (lineMatcher.find()) //(we like the payload)
>      return true;
>   }
>   //reached EOS -- return false
>   return false;
>  }
>
> However the issue is that, now that I have the line I want to break up the individual
line into tokens along white space, but the WhitespaceTokenizer does not take a TokenStream
as a constructor parameter.  Can anyone offer  suggestion for a workaround?
>
> Regards,
>
> Lev Bronshtein
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message