lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Armbrust, Daniel C." <Armbrust.Dan...@mayo.edu>
Subject RE: Lucene Benchmarks and Information
Date Mon, 23 Dec 2002 20:59:02 GMT
-----Original Message-----
From: Leo Galambos [mailto:galambos@com-os2.ms.mff.cuni.cz] 
Sent: Saturday, December 21, 2002 9:36 AM
To: Lucene Users List
Subject: Re: Lucene Benchmarks and Information
[snip]

>IMHO it is a bug and the
>point why Lucene does not scale well on huge collections of documents. I
>am talking about my previous tests when I used live index and concurrent
>query+insert+delete (I wanted to simulate real application).

[snip]

What is your definition of huge?  I have yet to have a problem, and I am running one of the
biggest indexes that I have seen posted to the mailing list.  I've been very impressed with
the way that lucene scales.  Apparently I was not on the mailing list when you posted these
tests.  (I'm still fairly new)


>BTW, your mail is also an answer to previous topic "how often could one
>call optimize()". The method would be called before the index goes to
>production state. And it also means that tests are irrelevant until they
>are made with lower mergeFactor.

[snip]

Maybe "irrelevant" to you, but I didn't intend my exercise to be a benchmark as to how fast
I could make Lucene Index, as there are a lot of things that I could have done to make it
faster.  (And I ended up learning several more via the experiment and follow up discussion
here)  Maybe "Benchmarks" is a bad word to have in the subject.  They were done so that 

A.  So I know that there is no limitation (that will affect me) in Lucene (Hardcoded, bug,
or designwise) as to how many documents can be put into an index.  That's why I built this
~43 million document index.  Just to see if I could.

B.  I know the impact on search times of adding more documents

C.  I know I can search this size of an index without running into problems.


I would imagine any benchmark that says I can index x documents this fast is fairly irrelevant
to anyone else using different hardware, as it varies too much based  on disk speed, platform,
cpu, doc size, doc format (in my real apps I'm doing xml transformations), how dedicated the
machine is, jvm, etc etc etc.  

The results were posted to the list so that the question 

"I just found Lucene.  It looks nice, but can it handle 30 (or more) million documents?"

can be answered matter of factly to others in the future.  Additionally, it serves as a *very*
rough guide to the amount of hardware you would need to construct your index of X documents
in Y amount of time.

Dan

 

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message