hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dieter Plaetinck <dieter.plaeti...@intec.ugent.be>
Subject Re: HDFS Explained as Comics
Date Thu, 01 Dec 2011 09:41:36 GMT
Very clear.  The comic format works indeed quite well.
I never considered comics as a serious ("professional") way to get something explained efficiently,
but this shows people should think twice before they start writing their next documentation.

one question though: if a DN has a corrupted block, why does the NN only remove the bad DN
from the block's list, and not the block from the DN list?
(also, does it really store the data in 2 separate tables?  This looks to me like 2 different
views of the same data?)

Dieter

On Thu, 1 Dec 2011 08:53:31 +0100
"Alexander C.H. Lorenz" <wget.null@googlemail.com> wrote:

> Hi all,
> 
> very cool comic!
> 
> Thanks,
>  Alex
> 
> On Wed, Nov 30, 2011 at 11:58 PM, Abhishek Pratap Singh
> <manu.infy@gmail.com
> > wrote:
> 
> > Hi,
> >
> > This is indeed a good way to explain, most of the improvement has
> > already been discussed. waiting for sequel of this comic.
> >
> > Regards,
> > Abhishek
> >
> > On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney
> > <mvarshney@gmail.com
> > >wrote:
> >
> > > Hi Matthew
> > >
> > > I agree with both you and Prashant. The strip needs to be
> > > modified to explain that these can be default values that can be
> > > optionally
> > overridden
> > > (which I will fix in the next iteration).
> > >
> > > However, from the 'understanding concepts of HDFS' point of view,
> > > I still think that block size and replication factors are the
> > > real strengths of HDFS, and the learners must be exposed to them
> > > so that they get to see
> > how
> > > hdfs is significantly different from conventional file systems.
> > >
> > > On personal note: thanks for the first part of your message :)
> > >
> > > -Maneesh
> > >
> > >
> > > On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) <
> > > matthew.goeke@monsanto.com> wrote:
> > >
> > > > Maneesh,
> > > >
> > > > Firstly, I love the comic :)
> > > >
> > > > Secondly, I am inclined to agree with Prashant on this latest
> > > > point.
> > > While
> > > > one code path could take us through the user defining command
> > > > line overrides (e.g. hadoop fs -D blah -put foo bar) I think it
> > > > might
> > confuse
> > > a
> > > > person new to Hadoop. The most common flow would be using admin
> > > determined
> > > > values from hdfs-site and the only thing that would need to
> > > > change is
> > > that
> > > > conversation happening between client / server and not user /
> > > > client.
> > > >
> > > > Matt
> > > >
> > > > -----Original Message-----
> > > > From: Prashant Kommireddi [mailto:prash1784@gmail.com]
> > > > Sent: Wednesday, November 30, 2011 3:28 PM
> > > > To: common-user@hadoop.apache.org
> > > > Subject: Re: HDFS Explained as Comics
> > > >
> > > > Sure, its just a case of how readers interpret it.
> > > >
> > > >   1. Client is required to specify block size and replication
> > > > factor
> > each
> > > >   time
> > > >   2. Client does not need to worry about it since an admin has
> > > > set the properties in default configuration files
> > > >
> > > > A client could not be allowed to override the default configs
> > > > if they
> > are
> > > > set final (well there are ways to go around it as well as you
> > > > suggest
> > by
> > > > using create(....) :)
> > > >
> > > > The information is great and helpful. Just want to make sure a
> > > > beginner
> > > who
> > > > wants to write a "WordCount" in Mapreduce does not worry about
> > specifying
> > > > block size' and replication factor in his code.
> > > >
> > > > Thanks,
> > > > Prashant
> > > >
> > > > On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney
> > > > <mvarshney@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi Prashant
> > > > >
> > > > > Others may correct me if I am wrong here..
> > > > >
> > > > > The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge
> > > > > of
> > block
> > > > size
> > > > > and replication factor. In the source code, I see the
> > > > > following in
> > the
> > > > > DFSClient constructor:
> > > > >
> > > > >    defaultBlockSize = conf.getLong("dfs.block.size",
> > > DEFAULT_BLOCK_SIZE);
> > > > >
> > > > >    defaultReplication = (short)
> > > > > conf.getInt("dfs.replication", 3);
> > > > >
> > > > > My understanding is that the client considers the following
> > > > > chain for
> > > the
> > > > > values:
> > > > > 1. Manual values (the long form constructor; when a user
> > > > > provides
> > these
> > > > > values)
> > > > > 2. Configuration file values (these are cluster level
> > > > > defaults: dfs.block.size and dfs.replication)
> > > > > 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)
> > > > >
> > > > > Moreover, in the
> > > > > org.apache.hadoop.hdfs.protocool.ClientProtocol the
> > > API
> > > > to
> > > > > create a file is
> > > > > void create(...., short replication, long blocksize);
> > > > >
> > > > > I presume it means that the client already has knowledge of
> > > > > these
> > > values
> > > > > and passes them to the NameNode when creating a new file.
> > > > >
> > > > > Hope that helps.
> > > > >
> > > > > thanks
> > > > > -Maneesh
> > > > >
> > > > > On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi <
> > > > prash1784@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Thanks Maneesh.
> > > > > >
> > > > > > Quick question, does a client really need to know Block
> > > > > > size and replication factor - A lot of times client has no
> > > > > > control over
> > these
> > > > (set
> > > > > > at cluster level)
> > > > > >
> > > > > > -Prashant Kommireddi
> > > > > >
> > > > > > On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges <
> > > dejan.menges@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Hi Maneesh,
> > > > > > >
> > > > > > > Thanks a lot for this! Just distributed it over the team
> > > > > > > and
> > > comments
> > > > > are
> > > > > > > great :)
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Dejan
> > > > > > >
> > > > > > > On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney <
> > > > mvarshney@gmail.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > For your reading pleasure!
> > > > > > > >
> > > > > > > > PDF 3.3MB uploaded at (the mailing list has a cap
of 1MB
> > > > > attachments):
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
> > > > > > > >
> > > > > > > >
> > > > > > > > Appreciate if you can spare some time to peruse this
> > > > > > > > little
> > > > > experiment
> > > > > > of
> > > > > > > > mine to use Comics as a medium to explain computer
> > > > > > > > science
> > > topics.
> > > > > This
> > > > > > > > particular issue explains the protocols and internals
> > > > > > > > of HDFS.
> > > > > > > >
> > > > > > > > I am eager to hear your opinions on the usefulness
of
> > > > > > > > this
> > visual
> > > > > > medium
> > > > > > > to
> > > > > > > > teach complex protocols and algorithms.
> > > > > > > >
> > > > > > > > [My personal motivations: I have always found text
> > > > > > > > descriptions
> > > to
> > > > be
> > > > > > too
> > > > > > > > verbose as lot of effort is spent putting the concepts
> > > > > > > > in
> > proper
> > > > > > > time-space
> > > > > > > > context (which can be easily avoided in a visual
> > > > > > > > medium);
> > > sequence
> > > > > > > diagrams
> > > > > > > > are unwieldy for non-trivial protocols, and they do
not
> > > > > > > > explain
> > > > > > concepts;
> > > > > > > > and finally, animations/videos happen "too fast" and
do
> > > > > > > > not
> > offer
> > > > > > > > self-paced learning experience.]
> > > > > > > >
> > > > > > > > All forms of criticisms, comments (and encouragements)
> > > > > > > > welcome
> > :)
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Maneesh
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > This e-mail message may contain privileged and/or confidential
> > > > information, and is intended to be received only by persons
> > > > entitled to receive such information. If you have received this
> > > > e-mail in error, please notify the sender immediately. Please
> > > > delete it and all attachments from any servers, hard drives or
> > > > any other media. Other use of this e-mail by you is strictly
> > > > prohibited.
> > > >
> > > > All e-mails and attachments sent and received are subject to
> > monitoring,
> > > > reading and archival by Monsanto, including its
> > > > subsidiaries. The recipient of this e-mail is solely
> > > > responsible for checking for the presence of "Viruses" or other
> > > > "Malware". Monsanto, along with its subsidiaries, accepts no
> > > > liability for any
> > > damage
> > > > caused by any such code transmitted by or accompanying
> > > > this e-mail or any attachment.
> > > >
> > > >
> > > > The information contained in this email may be subject to the
> > > > export control laws and regulations of the United States,
> > > > potentially including but not limited to the Export
> > > > Administration Regulations
> > (EAR)
> > > > and sanctions regulations issued by the U.S. Department of
> > > > Treasury, Office of Foreign Asset Controls (OFAC).  As a
> > > > recipient of
> > > this
> > > > information you are obligated to comply with all
> > > > applicable U.S. export laws and regulations.
> > > >
> > > >
> > >
> >
> 
> 
> 


Mime
View raw message