lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Using Lucene to search log files
Date Mon, 11 Dec 2006 13:21:50 GMT
As far as the appropriateness of Lucene, it's an open question, but I think
it'd be fine. If it isn't, you have an "interesting" problem <G>.

About timestamps. This has been discussed a LOT on the thread, since they're
not as straight-forward as you might assume. See the thread *"Date ranges -
getting the approach right" *for an exposition on what it's all about. The
thing you *must* understand is that some forms of a query will throw a "too
many clauses" exception. Especially if you store your dates to, say,
millisecond resolution and use the intuitive query forms. Under the covers
if you ask for, say, all queries between 12:00 and 13:00, Lucene will expand
this to a big query with a clause for every value in your index that
satisfies the range. For instance, if there are 2,000 different time
valuesin your index between the two times, there will be 2,000
clauses. If there
are 10,000 documents, but only 10 different times between these two values,
you'll get 10 clauses. Lucene defaults to 1024 maximum clauses, and if your
query expands to more than this, you get the TooManyClauses exception.

This does not apply to Filters, and there are specialty classes for dealing
with this issue. Also, you have some control over how many clauses by
choosing the resolution you store in your index. In the above, if you stored
only by minute, you'd never get more than 60 clauses in an hour.

And there are more graceful ways around this, so don't be discouraged.

I'm sure this is confusing (I know it certainly confused me for a long
time). My hope  is that as you work with the process and run across issues,
you'll be able to say "Oh, that is what they were talking about". And be of
good cheer, these are not show-stoppers at all, they have been dealt with
successfully on a wide range of projects.

Search the mail archive for date, daterange, toomanyclauses, etc and you'll
see the discussions.....


On 12/11/06, abdul aleem <> wrote:
> Hi All,
> Im a Lucene newbie,
> Requirement :
> ==============
> a) Build a log viewer tool, search log files for
> keywords and time stamp
> b)  files in production approx 200 logs per day and
> each log file may range from 1MB - 5MB
> Lucene
> ========
> We wanted to utilize Lucene's search capabilities
> especially search all 200 log files content quickly
> a) Search criteria:
>     i) Timestamp search: Fetch contents between any
> two timestamps
>    ii) Fetch log file contents for specified keyword
> Query
> ========
>     a ) Would greatly appreciate if some suggestions
>         whether Lucene will be appropriate tool for
> the requirement ??
>     b) I have tried to use SpanQuery however
> struggling to fetch entire conents e.g. (between two
> timestamps)
>     c) I had also looked at
> LargeScaleDateRangeProcessing in the wiki, is that a
> right approach for the requirement
>   Any help / suggestion would be greatly appreciated,
>   Many thanks in advance,
>   Abdul
> ____________________________________________________________________________________
> Do you Yahoo!?
> Everyone is raving about the all-new Yahoo! Mail beta.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message