lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Feasibility question
Date Wed, 12 Nov 2008 05:17:51 GMT
Yes, I think it is.  I think the only catch will be those log timestamps, how fine you really
need them to be, and if you want them very fine what happens when you do range queries on
timestamps.  If you have a pile of log files lying around, it should be pretty easy to get
them indexed.  You don't even have to write a client for searching the resulting index, just
point something like Luke to it, or even Solr.

Sematext -- -- Lucene - Solr - Nutch

From: Jeff Capone <>
Sent: Monday, November 10, 2008 6:51:20 PM
Subject: Feasibility question

Has anyone deployed Lucene to index log files?  I have seen some articles 
about how RackSpace used Lucene and Hadoop for log processing, but I have 
not seen any details on the implementation.  

To get my required analytics, I think I would need to treat each line of 
the Apache log files as a document and I though I would treat each field as 
a key word to minimize processing. 

Assuming you have clusters operating on independent datasets (so I guess it 
would scale linearly) and you want to process Terabytes of logs per day, 
is such a solution even feasible?

Thank you,

Jeff Capone

To unsubscribe, e-mail:
For additional commands, e-mail:
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message