lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Velikin" <pe...@velobit.com>
Subject RE: How to accelerate your Solr-Lucene appication by 4x
Date Fri, 20 Jan 2012 17:13:17 GMT
Hi Erick,

 

This is correct. An additional benefit to configuring the SSD as cache vs
primary storage is that you don't have to change anything to your existing
indexes (the cache will just give a performance boost). 

 

In addition to configuring the system to utilize SSDs as the location where
pages go when swapped out of RAM, VeloBit does a few more performance
optimization tricks, which I will explain below.

 

 

(I am wary of the sensitivity to commercial messages: the following will
explain some of the differentiators of VeloBit. So, if you have an aversion
to vendors promoting themselves, please do not read further)

 

VeloBit 

-          configures SSDs to appear as cache expansion 

-          compresses data at the block level so you can hold much more data
in cache (cache will appear much bigger than the physical size of SSD and
RAM; you'll get higher performance since you'll less frequently have to read
data from slow HDD)

-          makes decisions on what data to go in cache based on the
popularity of the contents of each block (increases cache hit rates and
system performance)

-          optimizes how data is placed and managed on the SSD which takes
care of write, erase, and garbage collection limitations inherent to flash
based SSDs (increases the performance of SSDs and extends the life of the
SSD; this enables you to use a commodity SSD from Best Buy for enterprise
workloads instead of having to buy really expensive high-end SSDs)

-          automates the whole process making everything plug & play (no
need to deal with storage and server architecture issues)

 

(end of commercial)

 

Best regards,

 

Peter

 

 

 

 

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Friday, January 20, 2012 11:45 AM
To: solr-user@lucene.apache.org; peter@velobit.com
Subject: Re: How to accelerate your Solr-Lucene appication by 4x

 

Peter:

 

I admit I've just scanned the thread, but it sounds like what you're really
doing under the covers is configuring your system to utilize the SSDs as
where your pages go when it's swapped out of RAM, is this correct?

 

Which would certainly speed things up substantially if swapping was
happening...

 

Best

Erick

 

On Fri, Jan 20, 2012 at 7:23 AM, Peter Velikin < <mailto:peter@velobit.com>
peter@velobit.com> wrote:

> Ted, Otis,

> 

> 

> 

> Thanks for the info. I'll take a stab at answering your question.

> 

> 

> 

> RAM:

> 

> Both of you are correct that if you were able to keep your index in RAM,
that would give you the fastest results. This works if you have a small
enough index. At ZoomInfo, the index was 600 GB (they have multiple types of
indexed data), so there was no way to keep it in RAM. Due to the size of the
index, they have elected to "shard" the data across two sets of systems for
manageability and performance reasons. So, while in theory performance would
be fastest if you keep the entire index in RAM, this is not possible or at
least not practical if you have a large index.

> 

> 

> 

> All SSD:

> 

> SSDs are a lot faster, so if you swap your HDDs with SSD, performance will
go up. But that's really expensive and is also disruptive. In Zoom's case,
they have a cluster of Dell 2970 servers with 8 cores, each with 6x 146GB,
15k rpm SAS drives. Going all SSD would be expensive for them and would also
require a disruption to running servers.

> 

> 

> 

> SSD as a cache only:

> 

> Since they wanted to avoid the cost and disruption of upgrading the
servers, Zoom added one OCZ Vertex 3 to each of the servers (at a cost of
$230 per SSD) and ran it as an expansion of RAM (cache was a combination of
RAM and SSD). All was configured on the running servers without any
disruption to the running application. The result was an immediate 4x
improvement in performance (responses per second went up from 12/sec to
48/sec, bandwidth went up from 500 KB/sec to 2.2 MB/sec). The VeloBit
software acts as a driver that automatically configures and manages the
RAM+SSD-combo cache; the value of SSD caching software is that it makes the
whole process plug&play.

> 

> 

> 

> So the argument is that adding 1 SSD to each server and using it as a
cache (more precisely as cache expansion to the cache already in RAM) will
give you the best price/performance benefit of all options you have.

> 

> 

> 

> Does this clarify things? Was I able to answer your question?

> 

> 

> 

> Best regards,

> 

> 

> 

> Peter

> 

> 

> 

> 

> 

> 

> 

> -----Original Message-----

> From: Ted Dunning [mailto:ted.dunning@gmail.com]

> Sent: Friday, January 20, 2012 2:42 AM

> To: solr-user@lucene.apache.org

> Subject: Re: How to accelerate your Solr-Lucene appication by 4x

> 

> 

> 

> Actually, for search applications there is a reasonable amount of evidence
that holding the index in RAM is actually more cost effective than SSD's
because the throughput is enough faster to make up for the price
differential.  There are several papers out of UMass that describe this
trade-off, although they are out-of-date enough to talk about 8GB memory as
being big.  One interest aspect of the work is the way that they keep an
index highly compressed yet still fast to search.

> 

> 

> 

> As a point of reference, most of Google's searches are served out of
memory in pretty much just this way.  Using SSD's would just slow them down.

> 

> 

> 

> On Fri, Jan 20, 2012 at 5:16 AM, Fuad Efendi < < <mailto:fuad@efendi.ca>
mailto:fuad@efendi.ca>  <mailto:fuad@efendi.ca> fuad@efendi.ca> wrote:

> 

> 

> 

>> I agree that SSD boosts performance... In some rare not-real-life
scenario:

> 

>> - super frequent commits

> 

>> That's it, nothing more except the fact that Lucene compile time

> 

>> including tests takes up to two minutes on MacBook with SSD, or

> 

>> forty-fifty minutes on Windows with HDD.

> 

>> Of course, with non-empty maven repository in both scenario, to be fair.

> 

>> 

> 

>> 

> 

>> another scenario: imagine google file system is powered by SSD 

>> instead

> 

>> of cheapest HDD... HAHAHA!!!

> 

>> 

> 

>> Can we expect response time 0.1 milliseconds instead of 30-50?

> 

>> 

> 

>> 

> 

>> And final question... Will SSD improve performance of fuzzy search?

> 

>> Range queries? Etc

> 

>> 

> 

>> 

> 

>> 

> 

>> I just want to say that SSD is faster than HDD but it doesn't mean

> 

>> anything...

> 

>> 

> 

>> 

> 

>> 

> 

>> -Fuad

> 

>> 

> 

>> 

> 

>> 

> 

>> 

> 

>> 

> 

>> Sent from my iPad

> 

>> 

> 

>> On 2012-01-19, at 9:40 AM, "Peter Velikin" < < <mailto:peter@velobit.com>
mailto:peter@velobit.com>  <mailto:peter@velobit.com> peter@velobit.com>
wrote:

> 

>> 

> 

>> > All,

> 

>> >

> 

>> > Point taken: my message should have been written more succinctly 

>> > and

> 

>> just stuck to the facts. Sorry for the sales pitch!

> 

>> >

> 

>> > However, I believe that adding SSD as a means to accelerate the

> 

>> performance of your Solr cluster is an important topic to discuss on

> 

>> this forum. There are many options for you to consider. I believe

> 

>> VeloBit would be the best option for many, but you have choices, some

> 

>> of them completely free. If interested, send me a note and I'll be

> 

>> happy to tell you about the different options (free or paid) you can
consider.

> 

>> >

> 

>> > Solr clusters are I/O bound. I am arguing that before you buy

> 

>> > additional

> 

>> servers, replace your existing servers with new ones, or swap your

> 

>> hard disks, you should try adding SSD as a cache. If the promise is

> 

>> that adding

> 

>> 1 SSD could save you the cost of 3 additional servers, you should try it.

> 

>> >

> 

>> > Has anyone else tried adding SSDs as a cache to boost the

> 

>> > performance of

> 

>> Solr clusters? Can you share your results?

> 

>> >

> 

>> >

> 

>> > Best regards,

> 

>> >

> 

>> > Peter Velikin

> 

>> > VP Online Marketing, VeloBit, Inc.

> 

>> >  < <mailto:peter@velobit.com> mailto:peter@velobit.com>
<mailto:peter@velobit.com> peter@velobit.com

> 

>> > tel. 978-263-4800

> 

>> > mob. 617-306-7165

> 

>> >

> 

>> > VeloBit provides plug & play SSD caching software that dramatically

> 

>> accelerates applications at a remarkably low cost. The software

> 

>> installs seamlessly in less than 10 minutes and automatically tunes

> 

>> for fastest application speed. Visit  < <http://www.velobit.com>
http://www.velobit.com>  <http://www.velobit.com> www.velobit.com for
details.

> 

>> >

> 

>> >

> 

>> >

> 

>> 

> 

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message