incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: General guidance on blur-shard server
Date Mon, 14 Sep 2015 10:42:44 GMT
Finally we are done with testing with short-circuit read and SSD_One
policy. Summarizing few crucial points we observed during query-runs

1. A single read issued by hadoop-client takes on an average 0.15-0.25
    ms for 32KB byte-size. Some-times this could be on the higher side
    like 0.6-0.65 ms per read… Actual SSD latencies got from iostat was
    around 0.1ms with spikes of 0.6 ms

2. The overhead of hadoop wrapper code involved in SSD-reads is very
    minimal & negligible. However we tested with a single-thread. May be
    when multiple-threads are involved during queries, hadoop could be
    a spoiler

3. It still makes sense to retain the block-cache. Assuming a bad-query
    makes about 1000 trips to hadoop. Time consumed ~= 0.15*1000 =
    150 ms. Block-cache could play a crucial role here. It could also help
    in resolving multi-threaded accesses

4. Segment writes/merges are actually slower than HDD may be because
    of sequential reads…

Overall, we found good gains especially for queries using short-circuit
reads when combined with block-cache.

--
Ravi



On Wed, Aug 12, 2015 at 6:34 PM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> Our very basic testing with SSD_One policy works as expected. Now we are
> moving to test the efficiency of SSD reads via hadoop..
>
> I see numerous params that need to be setup for hadoop short-circuit reads
> as documented here…
>
>
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_system-admin-guide/content/ch_short-circuit-reads-hdfs.html
>
> For production workloads are there any standard configs for blur?
>
> Especially, the following params
>
> 1. dfs.client.read.shortcircuit.streams.cache.size
>
> 2. dfs.client.read.shortcircuit.streams.cache.expiry.ms
>
> 3. dfs.client.read.shortcircuit.buffer.size
>
>
>
> On Tue, Aug 11, 2015 at 6:13 PM, Aaron McCurry <amccurry@gmail.com> wrote:
>
>> That is awesome!  Let know your results when you get a chance.
>>
>> Aaron
>>
>> On Mon, Aug 10, 2015 at 9:21 AM, Ravikumar Govindarajan <
>> ravikumar.govindarajan@gmail.com> wrote:
>>
>> > Hadoop 2.7.1 is out and now handles mixed storage… A single
>> > data-node/shard-server can run HDDs & SSDs together…
>> >
>> > More about this here…
>> >
>> >
>> >
>> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
>> >
>> > The policy I looked for was "SSD_One". The first-copy of index-data
>> placed
>> > on local-machine will be stored in SSD. The second & third-copies
>> stored on
>> > other machines will be in HDDs…
>> >
>> > This eliminates need for mixed setup using RACK1 & RACK2 I previously
>> > thought of. Hadoop 2.7.1 helps me to achieve this in a single cluster of
>> > machines running data-nodes + shard-servers
>> >
>> > Every machine stores primary copy in SSDs. Writes, Searches, Merges all
>> > take advantage of it, while replication can be relegated to slower but
>> > bigger capacity HDDs. These HDDs also serve as an online backup of less
>> > fault-tolerant SSDs
>> >
>> > We have ported our in-house blur extension to hadoop-2.7.1. Will update
>> on
>> > test results shortly
>> >
>> > --
>> > Ravi
>> >
>> > On Mon, Jun 22, 2015 at 6:18 PM, Aaron McCurry <amccurry@gmail.com>
>> wrote:
>> >
>> > > On Thu, Jun 18, 2015 at 8:55 AM, Ravikumar Govindarajan <
>> > > ravikumar.govindarajan@gmail.com> wrote:
>> > >
>> > > > Apologize for resurrecting this thread…
>> > > >
>> > > > One problem of lucene is OS buffer-cache pollution during segment
>> > merges,
>> > > > as documented here
>> > > >
>> > > >
>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html
>> > > >
>> > > > This problem could occur in Blur, when short-circuit reads are
>> > enabled...
>> > > >
>> > >
>> > > True but Blur deals with this issue by not allowing (by default) the
>> > merges
>> > > to effect the Block Cache.
>> > >
>> > >
>> > > >
>> > > > My take on this…
>> > > >
>> > > > It may be possible to overcome the problem by simply re-directing
>> > > > merge-read requests to a node other than local-node instead of fancy
>> > > stuff
>> > > > like O_DIRECT, FADVISE etc...
>> > > >
>> > >
>> > > I have always thought of having merge occur in a Mapreduce (or Yarn)
>> job
>> > > instead of locally.
>> > >
>> > >
>> > > >
>> > > > In a mixed setup, this means merge requests need to be diverted to
>> > > low-end
>> > > > Rack2 machines {running only data-nodes} while short-circuit read
>> > > requests
>> > > > will continue to be served from high-end Rack1 machines {running
>> both
>> > > > shard-server and data-nodes}
>> > > >
>> > > > Hadoop 2.x provides a cool read-API "seekToNewSource"
>> > > > API documentation says "Seek to given position on a node other than
>> the
>> > > > current node"
>> > >
>> > >
>> > > > From blur code, it's just enough if we open a new FSDataInputStream
>> for
>> > > > merge-reads and issue seekToNewSource call. Once merges are done,
it
>> > can
>> > > > closed & discarded…
>> > > >
>> > > > Please let know your view-points on this…
>> > > >
>> > >
>> > > We could do this, but I find that reading the TIM file types over the
>> > wire
>> > > during a merge causes a HUGE slow down in merge performance.  The
>> fastest
>> > > way to merge is to copy the TIM files involved in the merge locally to
>> > run
>> > > the merge and then delete them after the fact.
>> > >
>> > > Aaron
>> > >
>> > >
>> > > >
>> > > > --
>> > > > Ravi
>> > > >
>> > > > On Mon, Mar 9, 2015 at 5:45 PM, Ravikumar Govindarajan <
>> > > > ravikumar.govindarajan@gmail.com> wrote:
>> > > >
>> > > > >
>> > > > > On Sat, Mar 7, 2015 at 11:00 AM, Aaron McCurry <
>> amccurry@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > >>
>> > > > >> I thought the normal hdfs replica rules were once local.
One
>> remote
>> > > rack
>> > > > >> once same rack.
>> > > > >>
>> > > > >
>> > > > > Yes. One copy is local & other two copies on the same remote
rack.
>> > > > >
>> > > > > How did
>> > > > >> land on your current configuration ?
>> > > > >
>> > > > >
>> > > > > When I was evaluating disk-budget, we were looking at 6 expensive
>> > > drives
>> > > > > per machine. It lead me to think what those 6 drives would do
&
>> how
>> > we
>> > > > can
>> > > > > reduce the cost. Then stumbled on this two-rack setup and now
we
>> need
>> > > > only
>> > > > > 2 such drives...
>> > > > >
>> > > > > Apart from reduced disk-budget & write-overhead on cluster,
it
>> also
>> > > helps
>> > > > > in greater availability as rack-failure would be recoverable...
>> > > > >
>> > > > > --
>> > > > > Ravi
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message