hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12620) Advanced Hadoop Architecture (AHA) - Common
Date Tue, 08 Dec 2015 13:39:11 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046860#comment-15046860

Steve Loughran commented on HADOOP-12620:

I'm trying to make sense of this: I think there's too much detail on marketing stuff, listing
of things up the stack. 

JIRAs are for technical issues. We can discuss the engineering aspects, without worrying about
the business plan merits. 

Similarly, we don't need citations and references to things like BigTable and HTTP1.1. You
can assume the audience knows how to telnet to port 80 and type in a GET request by hand,
has had time during test runs to read the many google papers, may even have spent time with
the authors of some of them —perhaps were even former colleagues. And with things like HBase
being based off BigTable, you really don't need to go there.

>From what I do understand

h3. You are proposing HDFS adds {{write(offset, data)}} (or more specifically {{seek(offset);

Everyone recognises the merits of this; it's the key feature of a POSIX FS which HDFS lacks.
It's certainly something we've discussed in the past in a wistful "wouldn't it be nice if..."
kind of way. Though that can go the other way , "wouldn't it be nice if all we offered clients
was a blobstore API", as that has other benefits.

h3. you are proposing Multi Version Concurrency Control as the update mechanism.

MVCC is a way of delivering a view of data to clients which are consistent over a sequence
of operations. 

Actually, we don't need to worry about that. The consistency model of HDFS already says, "there
is no guarantee when or whether changes to the contents of a file (or its metadata) becomes
visible to current readers ([filesystem specification|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md]).

that means today, append, rename, delete: they may become visible to callers with existing
open streams, they may not, they may become visible at some point as the caller reads or seeks
through the FS.

Posix does sort of say "changes should be visible", but even in NFS, the cache model delayed
changes  (see, _The design and implementation of the Sun Network Filesystem_). Inconsistency
significantly aids implementation and performance of a distributed FS.

So: we don't need to worry about providing a consistent view of data to clients. This is good,
because if there was, say, a 30GB file and one client went {{write(offset=1GB, data-3GB)}},
HDFS would suddenly have to snapshot 3GB of data to serve up to callers. And then if another
client did exactly the same operation, there'd be another snapshot, etc, etc... and before
long you get to implement a DOS attack against the storage capacity of HDFS.

What does need to be addressed is:

# How to implement offset overwrites without threatening the integrity of data stored in HDFS.
That is, the existing write-chain needs to be set up to now have replicated overwrite operations.
The append code shows the beginning of what needs to be done there —though the fact that
they were adding entirely new blocks made this possible.

# How to implement post-EOF writes. That is, for a 30GB file, how to handle {{write(offset=50GB,
data='a')}} by writing a small number of bytes, rather than having to save 20GB of zeros.
 Effectively that means HDFS has to implement sparse files. That's both in generating them,
and having clients work with them efficiently, for both reading and further updates. We also
have to consider whether writing to a sparse file fills up quotas based on the actual or theoretical
size. Theoretical would be the easiest, and avoid quirks like quotas being exceeded if you
go back to offset=35GB and writing 10GB of data.

The impact on the layers above, they would be tangible, but the foundational feature: seek+write,
is what everything depends on. Without that, there's no point worrying about dependencies
in other projects, even filing the JIRAs.

Please then, come up with your proposal for this. A PDF attached to this JIRA would be a start.
It should cover the details of how this can be implemented within the Hadoop distributed FS
as it stands today: with the core write chain, plus the new complexities of encrypted storage,
erasure coding, multi-tier storage with tier-specific quotas. That's more than just theoretical
details, you're going to have to look at the code and make suggestions. Ideally, an initial
proof of concept on your own fork of the codebase, code + basic tests, with all the existing
regression tests verifying nothing appears to have broken. [How to Contribute|https://wiki.apache.org/hadoop/HowToContribute]
covers the process here.

Is that a lot to ask? Yes, but HDFS is the most critical part of the Hadoop stack; data integrity
is the one thing the team cares about more than anything else. Something at the YARN layer
could impact availability or performance —but it shouldn't lose or corrupt data. Things
at the HDFS layer do, and every time something has gone in there have been surprises downstream.
Certainly it's why the HDFS team doesn't trust me to changes in their code.

To close then: having a tangible proposal of how to implement this on the existing HDFS codebase
would be the best way to start this work —text and initial PoC.

> Advanced Hadoop Architecture (AHA) - Common
> -------------------------------------------
>                 Key: HADOOP-12620
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12620
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: Dinesh S. Atreya
>            Assignee: Dinesh S. Atreya
> Advance Hadoop Architecture (AHA) / Advance Hadoop Adaptabilities (AHA):
> See https://issues.apache.org/jira/browse/HADOOP-12620?focusedCommentId=15046300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15046300
for more details.

This message was sent by Atlassian JIRA

View raw message