lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Lucene as syslog storage
Date Sun, 18 Jun 2006 19:09:00 GMT
there's somebody on the mailing list who's talking about indexing a Billion
(with a "B") documents. I don't know how far they've gotten, but at least
*somebody* has contemplated a huge archive <G>... If memory serves, s/he had
indexed a significant number of documents, you might try searching for
"billion" in the archive. It was within the last couple of weeks.

Be aware that Lucene, by default, only indexes the first 10,000 words in a
document, so if your starting point is a large, existing log you have to
adjust this (there's a call, but I sure don't remember it off the top of my
head).

I've personally indexed over 1,000,000 documents and Lucene doesn't even
breath hard.

It'd probably be worth creating a small program the generates information
and indexes it to play around with and see if you get what you need. The
data won't be "real", but at least you'll have a better sense of how it
plays in your environment.

Best
Erick

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message