lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davis, Daniel (NIH/NLM) [C]" <daniel.da...@nih.gov>
Subject RE: deploy solr on cloud providers
Date Tue, 05 Jul 2016 17:49:59 GMT
Lorenzo, this probably comes late, but my systems guys just don't want to give me real disk.
  Although RAID-5 or LVM on-top of JBOD may be better than Amazon EBS, Amazon EBS is still
much closer to real disk in terms of IOPS and latency than NFS ;)    I even ran a mini test
(not an official benchmark), and found the response time for random reads to be better.

If you are a young/smallish company, this may be all in the cloud, but if you are in a large
organization like mine, you may also need to allow for other architectures, such as a "virtual"
Netapp in the cloud that communicates with a physical Netapp on-premises, and the throughput/latency
of that.   The most important thing is to actually measure the numbers you are getting, both
for search and for simply raw I/O, or to get your systems/storage guys to measure those numbers.
    If you get your systems/storage guys to just measure storage - you will want to care about
three things for indexing primarily:

	Sequential Write Throughput
	Random Read Throughput
	Random Read Response Time/Latency

Hope this helps,

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH



-----Original Message-----
From: Lorenzo Fundaró [mailto:lorenzo.fundaro@dawandamail.com] 
Sent: Tuesday, July 05, 2016 3:20 AM
To: solr-user@lucene.apache.org
Subject: Re: deploy solr on cloud providers

Hi Shawn. Actually what im trying to find out is whether this is the best approach for deploying
solr in the cloud. I believe solrcloud solves a lot of problems in terms of High Availability
but when it comes to storage there seems to be a limitation that can be workaround of course
but it's a bit cumbersome and i was wondering if there is a better option for this or if im
missing something with the way I'm doing it. I wonder if there are some proved experience
about how to solve the storage problem when deploying in the cloud. Any advise or point to
some enlightening documentation will be appreciated. Thanks.
On Jul 4, 2016 18:27, "Shawn Heisey" <apache@elyograg.org> wrote:

> On 7/4/2016 10:18 AM, Lorenzo Fundaró wrote:
> > when deploying solr (in solrcloud mode) in the cloud one has to take 
> > care of storage, and as far as I understand it can be a problem 
> > because the storage should go wherever the node is created. If we 
> > have for example, a node on EC2 with its own persistent disk, this 
> > node happens to be the leader and at some point crashes but couldn't 
> > make the replication of the data that has in the transaction log, 
> > how do we do in that case ? Ideally the new node must use the 
> > leftover data that the death node left, but this is a bit cumbersome 
> > in my opinion. What are the best practices for this ?
>
> I can't make any sense of this.  What is the *exact* problem you need 
> to solve?  The details can be very important.
>
> We might be dealing with this:
>
> http://people.apache.org/~hossman/#xyproblem
>
> Thanks,
> Shawn
>
>
Mime
View raw message