hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suresh Srinivas <sur...@hortonworks.com>
Subject Re: [VOTE] Merge HDFS-6581 to trunk - Writing to replicas in memory.
Date Wed, 24 Sep 2014 18:12:59 GMT
Features are done in a separate branch for two reasons: 1) During a feature
development the branch may be not functional 2) The high level approach and
design is not very clear and development can continue while that is being
sorted out. In case of this feature, clearly (2) is not an issue. We have
had enough discussion about the approach. I also think this branch is ready
for merge without rendering trunk not functional. That leaves us with one
objection that is being raised; the content is not complete. This did not
prevent recently merged crypto file system work from being merged to trunk.
We discussed and decided that it can be merged to trunk and we will finish
the remaining work that is a must have in trunk before merging to branch-2.
That is a reasonable approach for this feature as well. In fact there are a
lot of bug fixes and improvements still happening on crypto work even after
branch-2 merge (which is a good thing in my opinion, it shows that the
testing is continuing and feature is still being improved).

As regards to the eviction policy we have had discussion in the jira. I
disagree that there exists a "usable" or "better" strategy that works well
for all the work loads. I also disagree that additional policy to the
existing LRU is needed, leave alone it being needed for trunk merge. New
eviction policies need to be developed as people experiment with it. Hence
making the eviction policy pluggable is great way to approach it.

There are some testing information posted on the jira, such as performance.
But I think we need more details on how this feature is being tested.
Arpit, can you please post test details. As for me, given how much work
(both discussion and coding) has gone into it, and given it is a very
important feature that allows experimentation to use memory tier, I would
like to see this available in release 2.6.

On Tue, Sep 23, 2014 at 6:09 PM, Colin McCabe <cmccabe@alumni.cmu.edu>
wrote:

> This seems like a really aggressive timeframe for a merge.  We still
> haven't implemented:
>
> * Checksum skipping on read and write from lazy persisted replicas.
> * Allowing mmaped reads from the lazy persisted data.
> * Any eviction strategy other than LRU.
> * Integration with cache pool limits (how do HDFS-4949 and lazy
> persist replicas share memory)?
> * Eviction from RAM disk via truncation (HDFS-6918)
> * Metrics
> * System testing to find out how useful this is, and what the best
> eviction strategy is.
>
> I see why we might want to defer checksum skipping, metrics, allowing
> mmap, eviction via truncation, and so forth until later.  But I feel
> like we need to figure out how this will integrate with the memory
> used by HDFS-4949 before we merge.  I also would like to see another
> eviction strategy other than LRU, which is a very poor eviction
> strategy for scanning workloads.  I mentioned this a few times on the
> JIRA.
>
> I'd also like to get some idea of how much testing this has received
> in a multi-node cluster.  What makes us confident that this is the
> right time to merge, rather than in a week or two?
>
> best,
> Colin
>
>
> On Tue, Sep 23, 2014 at 4:55 PM, Arpit Agarwal <aagarwal@hortonworks.com>
> wrote:
> > I have posted write benchmark results to the Jira.
> >
> > On Tue, Sep 23, 2014 at 3:41 PM, Arpit Agarwal <aagarwal@hortonworks.com
> >
> > wrote:
> >
> >> Hi Andrew, I said "it is not going to be a substantial fraction of
> memory
> >> bandwidth". That is certainly not the same as saying it won't be good or
> >> there won't be any improvement.
> >>
> >> Any time you have transfers over RPC or the network stack you will not
> get
> >> close to the memory bandwidth even for intra-host transfers.
> >>
> >> I'll add some micro-benchmark results to the Jira shortly.
> >>
> >> Thanks,
> >> Arpit
> >>
> >> On Tue, Sep 23, 2014 at 2:33 PM, Andrew Wang <andrew.wang@cloudera.com>
> >> wrote:
> >>
> >>> Hi Arpit,
> >>>
> >>> Here is the comment. It was certainly not my intention to misquote
> anyone.
> >>>
> >>>
> >>>
> https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14138223&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14138223
> >>>
> >>> Quote:
> >>>
> >>> It would be nice to see that would could get a substantial fraction of
> >>> memory bandwidth when writing to a single replica in-memory.
> >>>
> >>> The comparison will be interesting but I can tell you without
> measurement
> >>> it is not going to be a substantial fraction of memory bandwidth. We
> are
> >>> still going through DataTransferProtocol with all the copies and
> overhead
> >>> that involves.
> >>>
> >>> When the goal is in-memory writes and we are unable to achieve a
> >>> substantial fraction of memory bandwidth, to me that is "not good
> >>> performance."
> >>>
> >>> I also looked through the subtasks, and AFAICT the only one related to
> >>> improving this is deferring checksum computation. The benchmarking we
> did
> >>> on HDFS-4949 showed that this only really helps when you're down to
> single
> >>> copy or zero copies with SCR/ZCR. DTP reads didn't see much of an
> >>> improvement, so I'd guess the same would be true for DTP writes.
> >>>
> >>> I think my above three questions are still open, as well as my question
> >>> about why we're merging now, as opposed to when the performance of the
> >>> branch is proven out.
> >>>
> >>> Thanks,
> >>> Andrew
> >>>
> >>> On Tue, Sep 23, 2014 at 2:10 PM, Arpit Agarwal <
> aagarwal@hortonworks.com>
> >>> wrote:
> >>>
> >>> > Andrew, don't misquote me. Can you link the comment where I said
> >>> > performance wasn't going to be good?
> >>> >
> >>> > I will add some add some preliminary write results to the Jira later
> >>> today.
> >>> >
> >>> > > What's the plan to improve write performance?
> >>> > I described this in response to your and Colin's comments on the
> Jira.
> >>> >
> >>> > For the benefit of folks not following the Jira, the immediate task
> we'd
> >>> > like to get done post-merge is moving checksum computation off the
> write
> >>> > path. Also see open subtasks of HDFS-6581 for other planned perf
> >>> > improvements.
> >>> >
> >>> > Thanks,
> >>> > Arpit
> >>> >
> >>> >
> >>> > On Tue, Sep 23, 2014 at 1:07 PM, Andrew Wang <
> andrew.wang@cloudera.com>
> >>> > wrote:
> >>> >
> >>> > > Hi Arpit,
> >>> > >
> >>> > > On HDFS-6581, I asked for write benchmarks on Sep 19th, and you
> >>> responded
> >>> > > that the performance wasn't going to be good. However, I thought
> the
> >>> > > primary goal of this JIRA was to improve write performance, and
> write
> >>> > > performance is listed as the first feature requirement in the
> design
> >>> doc.
> >>> > >
> >>> > > So, this leads me to a few questions, which I also asked last
week
> on
> >>> the
> >>> > > JIRA (I believe still unanswered):
> >>> > >
> >>> > > - What's the plan to improve write performance?
> >>> > > - What kind of performance can we expect after the plan is
> completed?
> >>> > > - Can this expected performance be validated with a prototype?
> >>> > >
> >>> > > Even with these questions answered, I don't understand the need
to
> >>> merge
> >>> > > this before the write optimization work is completed. Write perf
is
> >>> > listed
> >>> > > as a feature requirement, so the branch can reasonably be called
> not
> >>> > > feature complete until it's shown to be faster.
> >>> > >
> >>> > > Thanks,
> >>> > > Andrew
> >>> > >
> >>> > > On Tue, Sep 23, 2014 at 11:47 AM, Jitendra Pandey <
> >>> > > jitendra@hortonworks.com>
> >>> > > wrote:
> >>> > >
> >>> > > > +1. I have reviewed most of the code in the branch, and I
think
> its
> >>> > ready
> >>> > > > to be merged to trunk.
> >>> > > >
> >>> > > >
> >>> > > > On Mon, Sep 22, 2014 at 5:24 PM, Arpit Agarwal <
> >>> > aagarwal@hortonworks.com
> >>> > > >
> >>> > > > wrote:
> >>> > > >
> >>> > > > > HDFS Devs,
> >>> > > > >
> >>> > > > > We propose merging the HDFS-6581 development branch
to trunk.
> >>> > > > >
> >>> > > > > The work adds support to write to HDFS blocks in memory.
The
> >>> target
> >>> > use
> >>> > > > > case covers applications writing relatively small, intermediate
> >>> data
> >>> > > sets
> >>> > > > > with low latency. We introduce a new CreateFlag for
the
> existing
> >>> > > > CreateFile
> >>> > > > > API. HDFS will subsequently attempt to place replicas
of file
> >>> blocks
> >>> > in
> >>> > > > > local memory with disk writes occurring off the hot
path. The
> >>> current
> >>> > > > > design is a simplification of original ideas from Sanjay
Radia
> on
> >>> > > > > HDFS-5851.
> >>> > > > >
> >>> > > > > Key goals of the feature were minimal API changes to
reduce
> >>> > application
> >>> > > > > burden and best effort data durability. The feature
is optional
> >>> and
> >>> > > > > requires appropriate DN configuration from administrators.
> >>> > > > >
> >>> > > > > Design doc:
> >>> > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://issues.apache.org/jira/secure/attachment/12661926/HDFSWriteableReplicasInMemory.pdf
> >>> > > > >
> >>> > > > > Test plan:
> >>> > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://issues.apache.org/jira/secure/attachment/12669452/Test-Plan-for-HDFS-6581-Memory-Storage.pdf
> >>> > > > >
> >>> > > > > There are 28 resolved sub-tasks under HDFS-6581, 3 open
tasks
> for
> >>> > > > > tests+Jenkins issues  and 7 open subtasks tracking planned
> >>> > > improvements.
> >>> > > > > The latest merge patch is 3300 lines of changed code
of which
> 1300
> >>> > > lines
> >>> > > > is
> >>> > > > > new and updated tests. Merging the branch to trunk will
allow
> HDFS
> >>> > > > > applications to start evaluating the feature. We will
continue
> >>> work
> >>> > on
> >>> > > > > documentation, performance tuning and metrics in parallel
with
> the
> >>> > vote
> >>> > > > and
> >>> > > > > post-merge.
> >>> > > > >
> >>> > > > > Contributors to design and code include Xiaoyu Yao,
Sanjay
> Radia,
> >>> > > > Jitendra
> >>> > > > > Pandey, Tassapol Athiapinya, Gopal V, Bikas Saha, Vikram
Dixit,
> >>> > Suresh
> >>> > > > > Srinivas and Chris Nauroth.
> >>> > > > >
> >>> > > > > Thanks to Haohui Mai, Colin Patrick McCabe, Andrew Wang,
Todd
> >>> Lipcon,
> >>> > > > Eric
> >>> > > > > Baldeschwieler and Vinayakumar B for providing useful
feedback
> on
> >>> > > > > HDFS-6581, HDFS-5851 and sub-tasks.
> >>> > > > >
> >>> > > > > The vote runs for the usual 7 days and will expire at
12am PDT
> on
> >>> Sep
> >>> > > 30.
> >>> > > > > Here is my +1 for the merge.
> >>> > > > >
> >>> > > > > Regards,
> >>> > > > > Arpit
> >>> > > > >
> >>> > > > > --
> >>> > > > > CONFIDENTIALITY NOTICE
> >>> > > > > NOTICE: This message is intended for the use of the
individual
> or
> >>> > > entity
> >>> > > > to
> >>> > > > > which it is addressed and may contain information that
is
> >>> > confidential,
> >>> > > > > privileged and exempt from disclosure under applicable
law. If
> the
> >>> > > reader
> >>> > > > > of this message is not the intended recipient, you are
hereby
> >>> > notified
> >>> > > > that
> >>> > > > > any printing, copying, dissemination, distribution,
disclosure
> or
> >>> > > > > forwarding of this communication is strictly prohibited.
If you
> >>> have
> >>> > > > > received this communication in error, please contact
the sender
> >>> > > > immediately
> >>> > > > > and delete it from your system. Thank You.
> >>> > > > >
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > --
> >>> > > > <http://hortonworks.com/download/>
> >>> > > >
> >>> > > > --
> >>> > > > CONFIDENTIALITY NOTICE
> >>> > > > NOTICE: This message is intended for the use of the individual
or
> >>> > entity
> >>> > > to
> >>> > > > which it is addressed and may contain information that is
> >>> confidential,
> >>> > > > privileged and exempt from disclosure under applicable law.
If
> the
> >>> > reader
> >>> > > > of this message is not the intended recipient, you are hereby
> >>> notified
> >>> > > that
> >>> > > > any printing, copying, dissemination, distribution, disclosure
or
> >>> > > > forwarding of this communication is strictly prohibited.
If you
> have
> >>> > > > received this communication in error, please contact the
sender
> >>> > > immediately
> >>> > > > and delete it from your system. Thank You.
> >>> > > >
> >>> > >
> >>> >
> >>> > --
> >>> > CONFIDENTIALITY NOTICE
> >>> > NOTICE: This message is intended for the use of the individual or
> >>> entity to
> >>> > which it is addressed and may contain information that is
> confidential,
> >>> > privileged and exempt from disclosure under applicable law. If the
> >>> reader
> >>> > of this message is not the intended recipient, you are hereby
> notified
> >>> that
> >>> > any printing, copying, dissemination, distribution, disclosure or
> >>> > forwarding of this communication is strictly prohibited. If you have
> >>> > received this communication in error, please contact the sender
> >>> immediately
> >>> > and delete it from your system. Thank You.
> >>> >
> >>>
> >>
> >>
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>



-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message