lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen <t...@kb.dk>
Subject Re: SOLR Data Locality
Date Fri, 17 Mar 2017 18:40:01 GMT
Imad Qureshi <Imadgreat@yahoo.com.INVALID> wrote:
> I understand that but unfortunately that's not an option right now.
> We already have 16 TB of index in HDFS.
> 
> So let me rephrase this question. How important is data locality for
> SOLR. Is performance impacted if SOLR data is on a remote node?

The short answer is yes, the long answer is https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Anecdotally we did some experiments prior to building our multi-TB search setup, where we
compared local SSDs with remote (Isilon) SSDs. That setup was with simple searches and some
faceting. I was a bit surprised that the slowdown was only 3x. I would expect the speed difference
to be even smaller if the underlying storage is slow (spinning disks). Old blog post at https://sbdevel.wordpress.com/2013/12/06/danish-webscale/


I don't understand the expected gain of adding replicas, if the data are remote. Why can't
the replica Solrs run on the nodes with the data? Do you have very CPU-intensive search?

- Toke Eskildsen

Mime
View raw message