hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bikas Saha <bi...@hortonworks.com>
Subject RE: VOTE: HDFS-347 merge
Date Tue, 26 Feb 2013 21:47:09 GMT

In my opinion, this feature of short circuit reads (HDFS-347 or HDFS-2246)
is not a desirable feature for HDFS. We should be working towards removing
this feature instead of enhancing it and making it popular.

Maybe short-circuit reads were something that HBase needed for performance
at a point in time when HDFS performance was slow. But after all the
improvements that have been made, is it still unacceptably slow to read
from HDFS? Is there more good engineering that we can do to close that
gap? Close it for all HDFS users and not just the ones who use
short-circuit reads?
Which brings me to the question - Who is the target audience for this
feature? From what I see, anyone who potentially wants to use it ==
everyone. Now if everyone starts using short circuit reads what happens to
the performance problem that we are trying to solve? Will performance
still be better then? This is especially important in the context of YARN
where we don't control the apps that run on the shared grid.

What problem are we trying to solve here? If we want better HDFS
performance and QOS for services then we want to give as much control over
the disk to HDFS rather than take it away. Short circuit reads leave a
gaping hole towards that end and making short circuit reads better and
easier to use makes that hole larger.

I am sorry for replying late and also because my response might be missing
historical perspectives that I am not aware of.


-----Original Message-----
From: rarecactus@gmail.com [mailto:rarecactus@gmail.com] On Behalf Of
Colin McCabe
Sent: Sunday, February 17, 2013 1:49 PM
To: hdfs-dev@hadoop.apache.org
Subject: VOTE: HDFS-347 merge

Hi all,

I would like to merge the HDFS-347 branch back to trunk.  It's been under
intensive review and testing for several months.  The branch adds a lot of
new unit tests, and passes Jenkins as of 2/15 [1]

We have tested HDFS-347 with both random and sequential workloads. The
short-circuit case is substantially faster [2], and overall performance
looks very good.  This is especially encouraging given that the initial
goal of this work was to make security compatible with short-circuit local
reads, rather than to optimize the short-circuit code path.  We've also
stress-tested HDFS-347 on a number of clusters.

This iniial VOTE is to merge only into trunk.  Just as we have done with
our other recent merges, we will consider merging into branch-2 after the
code has been in trunk for few weeks.

Please cast your vote by EOD Sunday 2/24.

Colin McCabe



View raw message