incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: Blur on SSDs...
Date Wed, 27 May 2015 07:25:55 GMT
>
> My guess is
> that SSDs are only going to help when the blocks for the shard are local
> and short circuit reads are enabled.


Yes, it's a good-fit for such a use-case alone…

I would not recommend disabling the block cache.  However you could likely
> lower the size of the cache and reduce the overall memory footprint of
> Blur.


Fine. Can we also scale down the machine RAM itself? [Ex: Instead of 128GB
RAM, we can opt for a 64GB or 32GB RAM slot]

 One interesting thought would be to
> try using the HDFS cache feature that is present in the most recent
> versions of HDFS.  I haven't tried it yet but it would be interesting to
> try.
>

I did try reading the HDFS cache code. Think it was written for Map-Reduce
use-case where blocks are loaded in memory [basically "mmap" followed by
"mlock" on data-nodes] just before computation begins and unloaded once
done.

On the short-circuit reads, I found that HDFS-Client is offering 2 options
for block-reads
1. Domain Socket
2. Mmap

I think Mmap is superior and must have the same performance as lucene's
MmapDirectory…

--
Ravi

On Tue, May 26, 2015 at 8:00 PM, Aaron McCurry <amccurry@gmail.com> wrote:

> On Fri, May 22, 2015 at 3:33 AM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
> > Recently I am trying to consider deploying SSDs on search machines
> >
> > Each machine runs data-nodes + shard-server and local reads of hadoop are
> > leveraged….
> >
> > SSDs are a great-fit for general lucene/solr kind of setups. But for
> blur,
> > I need some help…
> >
> > 1. Is it a good idea to consider SSDs, especially when block-cache is
> > present?
> >
>
> Possibly, I don't have any hard number for this type of setup.  My guess is
> that SSDs are only going to help when the blocks for the shard are local
> and short circuit reads are enabled.
>
>
> > 2. Are there any grids running blur on SSDs and how they compare to
> normal
> > HDDs?
> >
>
> I haven't run any at scale yet.
>
>
> > 3. Can we disable block-cache on SSDs, especially when local-reads are
> > enabled?
> >
>
> I would not recommend disabling the block cache.  However you could likely
> lower the size of the cache and reduce the overall memory footprint of
> Blur.
>
>
> > 4. Using SSDs, blur/lucene will surely be CPU bound. But I don't know
> what
> > over-heads hadoop local-reads brings to the table…
> >
>
> If you are using short circuit reads I have seen performance of local
> accesses nearing that of native IO.  However if Blur is making remote HDFS
> calls every call is like a cache miss.  One interesting thought would be to
> try using the HDFS cache feature that is present in the most recent
> versions of HDFS.  I haven't tried it yet but it would be interesting to
> try.
>
>
> >
> > Any help is much appreciated because I cannot find any info from web on
> > this topic
> >
> > --
> > Ravi
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message