Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 14085 invoked from network); 11 Dec 2006 13:22:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Dec 2006 13:22:21 -0000 Received: (qmail 53314 invoked by uid 500); 11 Dec 2006 13:22:23 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 53283 invoked by uid 500); 11 Dec 2006 13:22:23 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 53266 invoked by uid 99); 11 Dec 2006 13:22:23 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Dec 2006 05:22:23 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of erickerickson@gmail.com designates 64.233.182.191 as permitted sender) Received: from [64.233.182.191] (HELO nf-out-0910.google.com) (64.233.182.191) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Dec 2006 05:22:12 -0800 Received: by nf-out-0910.google.com with SMTP id n28so1968602nfc for ; Mon, 11 Dec 2006 05:21:51 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=OofHbeZ0duzDVQOE0x2xuVFAF6nxaXD41jUXslrsrOe9s9eNo9h+eH0g3it9fvb7BX/Ta/CN2nDEI20lrGC2ZHRtbR+hltJYximtfFU7hQxQi7uU0XKui7p5WdZmQcwlEokh4vB1Jeol86D49SAd4BxH79RuikbXrfPyulBwdAo= Received: by 10.82.114.3 with SMTP id m3mr598927buc.1165843310743; Mon, 11 Dec 2006 05:21:50 -0800 (PST) Received: by 10.82.162.20 with HTTP; Mon, 11 Dec 2006 05:21:50 -0800 (PST) Message-ID: <359a92830612110521w454b366ehd1e08bc93ad3d339@mail.gmail.com> Date: Mon, 11 Dec 2006 08:21:50 -0500 From: "Erick Erickson" To: java-user@lucene.apache.org Subject: Re: Using Lucene to search log files In-Reply-To: <980071.48434.qm@web52214.mail.yahoo.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_91171_26944372.1165843310703" References: <980071.48434.qm@web52214.mail.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_91171_26944372.1165843310703 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline As far as the appropriateness of Lucene, it's an open question, but I think it'd be fine. If it isn't, you have an "interesting" problem . About timestamps. This has been discussed a LOT on the thread, since they're not as straight-forward as you might assume. See the thread *"Date ranges - getting the approach right" *for an exposition on what it's all about. The thing you *must* understand is that some forms of a query will throw a "too many clauses" exception. Especially if you store your dates to, say, millisecond resolution and use the intuitive query forms. Under the covers if you ask for, say, all queries between 12:00 and 13:00, Lucene will expand this to a big query with a clause for every value in your index that satisfies the range. For instance, if there are 2,000 different time valuesin your index between the two times, there will be 2,000 clauses. If there are 10,000 documents, but only 10 different times between these two values, you'll get 10 clauses. Lucene defaults to 1024 maximum clauses, and if your query expands to more than this, you get the TooManyClauses exception. This does not apply to Filters, and there are specialty classes for dealing with this issue. Also, you have some control over how many clauses by choosing the resolution you store in your index. In the above, if you stored only by minute, you'd never get more than 60 clauses in an hour. And there are more graceful ways around this, so don't be discouraged. I'm sure this is confusing (I know it certainly confused me for a long time). My hope is that as you work with the process and run across issues, you'll be able to say "Oh, that is what they were talking about". And be of good cheer, these are not show-stoppers at all, they have been dealt with successfully on a wide range of projects. Search the mail archive for date, daterange, toomanyclauses, etc and you'll see the discussions..... Best Erick On 12/11/06, abdul aleem wrote: > > Hi All, > > Im a Lucene newbie, > > > Requirement : > ============== > a) Build a log viewer tool, search log files for > keywords and time stamp > > b) files in production approx 200 logs per day and > each log file may range from 1MB - 5MB > > Lucene > ======== > We wanted to utilize Lucene's search capabilities > especially search all 200 log files content quickly > > a) Search criteria: > i) Timestamp search: Fetch contents between any > two timestamps > > ii) Fetch log file contents for specified keyword > > > Query > ======== > a ) Would greatly appreciate if some suggestions > whether Lucene will be appropriate tool for > the requirement ?? > > > b) I have tried to use SpanQuery however > struggling to fetch entire conents e.g. (between two > timestamps) > > c) I had also looked at > LargeScaleDateRangeProcessing in the wiki, is that a > right approach for the requirement > > > > Any help / suggestion would be greatly appreciated, > > > Many thanks in advance, > Abdul > > > > > > ____________________________________________________________________________________ > Do you Yahoo!? > Everyone is raving about the all-new Yahoo! Mail beta. > http://new.mail.yahoo.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_91171_26944372.1165843310703--