lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Harwood <>
Subject Re: Storing a Lucene Index on a SAN Storage: good idea?
Date Sat, 26 Sep 2009 09:21:47 GMT
I have a client with 700 million doc index running on a SAN. The performance is v good but
this obviously depends on your choice of  SAN config. In this environment I have multiple
search servers happily hitting the same physical lucene index on the SAN. The servers negotiate
with each other via zookeeper to elect a server with write responsibilities.

One advantage of this approach over local disks is that index replication is relatively easy
(replication is effectively being handled by SAN's internal striping mechanisms). Index replica
synching as seen in Solr relies on careful shuffling of (hopefully) small segments of lucene
indexes between servers across the network with mechanisms for dealing with server/network
failures. Alternative local index synching approaches rely on replica servers independently
building their own indexes from source data.
Both of these replication strategies are arguably more fallible than a hardware-based striping
mechanism in a sophisticated SAN.  
However the disadvantage with a SAN is the potential for the SAN to become a bottleneck or
single point of failure (hardware or app-level index corruption). I'm sure high-end SAN vendors
can talk up their ability to hot deploy extra disks to boost performance if needed and sophisticated
failover mechanisms but these get expensive. 

If your client has invested significantly in a SAN and is pushing you to use it my experience
is that this can work but benchmarking vs local disk is key.  


On 26 Sep 2009, at 08:36, Matthias Hess <> wrote:


We are currently implementing our first Lucene project. We are building an application which
will index public Records on the internet, about 200'000 documents, each document is about
150 k in size. Our customer would like to store the Lucene index on a SAN disk.
We recommended the use of high speed local disks, but our customer would prefer SAN for their
better managability.

Does anybody have good or bad experiences with SAN disks?
Are there parameters like #read operations, data read rate or whatever, which must be met
to have a performance which rivals a good "local disk"?

Thanks for sharing your ideas and opinions!

Kind Regards
Matthias Hess

To unsubscribe, e-mail:
For additional commands, e-mail:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message