lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan OConnor <docon...@acquiremedia.com>
Subject Re: Storing a Lucene Index on a SAN Storage: good idea?
Date Sat, 26 Sep 2009 13:47:48 GMT


We have two instances of a search system containing 40 million documents - identical jvm versions,
lucene jars, and our code. One is running on local disks. The other is on a SAN. The instance
on local disks consistently far outperforms the SAN instance.

And I'd second Uwe's sentiments. An index of only 200,000 documents will be relatively small
and are a great candidate for local disks. 

Regards,
Dan

 

----- Original Message -----
From: Uwe Schindler <uwe@thetaphi.de>
To: java-user@lucene.apache.org <java-user@lucene.apache.org>
Sent: Sat Sep 26 08:41:13 2009
Subject: RE: Storing a Lucene Index on a SAN Storage: good idea?

If it's only 200,000 documents, I think the index will not be very big (a
few Gigabytes), especially, if the documents indexed are 150 kiB raw size
documents like PDFs or DOCs that have much binary overhead not indexed. Most
times the number of distinct terms is low. Index size is most times a lot
smaller than the original documents. If you also store the original doc
contents in the index, it get's bigger, but the time to access the same like
if they are normal files on the SAN.

This config would also run on my laptop :-) I think, the index will
completely be in memory after some time and so there is no difference
between SAN/local HDDs.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Matthias Hess [mailto:mat.hess@bluewin.ch]
> Sent: Saturday, September 26, 2009 9:36 AM
> To: java-user@lucene.apache.org
> Subject: Storing a Lucene Index on a SAN Storage: good idea?
> 
> Hello
> 
> We are currently implementing our first Lucene project. We are building
> an application which will index public Records on the internet, about
> 200'000 documents, each document is about 150 k in size. Our customer
> would like to store the Lucene index on a SAN disk.
> We recommended the use of high speed local disks, but our customer would
> prefer SAN for their better managability.
> 
> Does anybody have good or bad experiences with SAN disks?
> Are there parameters like #read operations, data read rate or whatever,
> which must be met to have a performance which rivals a good "local disk"?
> 
> Thanks for sharing your ideas and opinions!
> 
> Kind Regards
> Matthias Hess
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message