Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 1528 invoked from network); 6 Sep 2010 08:39:17 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 Sep 2010 08:39:17 -0000 Received: (qmail 72164 invoked by uid 500); 6 Sep 2010 08:39:14 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 71014 invoked by uid 500); 6 Sep 2010 08:39:10 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 70997 invoked by uid 99); 6 Sep 2010 08:39:09 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Sep 2010 08:39:09 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ian.lea@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qy0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Sep 2010 08:38:46 +0000 Received: by qyk2 with SMTP id 2so4916083qyk.14 for ; Mon, 06 Sep 2010 01:38:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=F8kAFy8txjIcl3A69I8dncantqONhwc6+WwzdOGWCI0=; b=QjV8VDn3a3etYUrS5UeVCzU+Mhok+nlh2T6TD3vRah9+Unk0dmC3tkutMxIDdfoW/T FfNy1LMvudcBrhX4bumrO6u6aBUOtX89BVvOufG4yPHo838vv/0iEd2LOh/NZCwsI8yC pQYI0AIABcLLO90dplxHX0tjZFGGr728dm/oo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=cCn40SDm/3wIESJvG7A5UkMa+X+hGShS3TqRrccP9FiycHq8OPCBdlzxtd2woTRAAh XEzQy/R6oFqFCUX5yqkhZ6N1ksBDjxgFikoGOk4b0dNic7EAALXXQGVHU0TkEVD11Wmh Uqs2P1HPrl7pf3yT+oceDLxXooChvxWEz1JnM= Received: by 10.229.235.66 with SMTP id kf2mr3194914qcb.2.1283762305255; Mon, 06 Sep 2010 01:38:25 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.223.132 with HTTP; Mon, 6 Sep 2010 01:38:05 -0700 (PDT) In-Reply-To: References: From: Ian Lea Date: Mon, 6 Sep 2010 09:38:05 +0100 Message-ID: Subject: Re: Line filtering To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org You'd be better off reading and selecting the syslog lines outside of lucene. Then pass the lines you are interested in to lucene using whatever analyzer you want. -- Ian. On Sun, Sep 5, 2010 at 10:09 PM, Lev Bronshtein wrote: > > Hello group, > > I am new to Lucene and ran into a bit of trouble while writing an app.=A0= I would like to selectively index lines from a syslog on a unix system, to= this end I first wrote tokenizer that returns an entire line as a token ex= tending CharTokenizer > > =A0 protected boolean isTokenChar(char c) { > =A0=A0=A0 return !((c =3D=3D '\n') || (c =3D=3D '\r')); > =A0 } > > Perhaps that is my first mistake and I should have done things differentl= y? > > I then pass this to a filter that only selects the lines with text I am i= nterested in > > =A0public final boolean incrementToken() throws IOException > =A0{ > =A0 while (input.incrementToken()) > =A0 { > =A0=A0 Matcher lineMatcher =3D linePattern.matcher(termAtt.term()); > =A0=A0 if (lineMatcher.find()) //(we like the payload) > =A0=A0=A0=A0 return true; > =A0 } > =A0 //reached EOS -- return false > =A0 return false; > =A0} > > However the issue is that, now that I have the line I want to break up th= e individual line into tokens along white space, but the WhitespaceTokenize= r does not take a TokenStream as a constructor parameter.=A0 Can anyone off= er=A0 suggestion for a workaround? > > Regards, > > Lev Bronshtein > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org