hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: HDFS architecture based on GFS?
Date Sun, 15 Feb 2009 23:22:12 GMT
A quick question here. How does a typical hadoop job work at the system
level? What are the various interactions and how does the data flow?

Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Sun, Feb 15, 2009 at 3:20 PM, Amandeep Khurana <amansk@gmail.com> wrote:

> Thanks Matei. If the basic architecture is similar to the Google stuff, I
> can safely just work on the project using the information from the papers.
>
> I am aware of the 4487 jira and the current status of the permissions
> mechanism. I had a look at them earlier.
>
> Cheers
> Amandeep
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Sun, Feb 15, 2009 at 2:40 PM, Matei Zaharia <matei@cloudera.com> wrote:
>
>> Forgot to add, this JIRA details the latest security features that are
>> being
>> worked on in Hadoop trunk:
>> https://issues.apache.org/jira/browse/HADOOP-4487.
>> This document describes the current status and limitations of the
>> permissions mechanism:
>> http://hadoop.apache.org/core/docs/current/hdfs_permissions_guide.html.
>>
>> On Sun, Feb 15, 2009 at 2:35 PM, Matei Zaharia <matei@cloudera.com>
>> wrote:
>>
>> > I think it's safe to assume that Hadoop works like MapReduce/GFS at the
>> > level described in those papers. In particular, in HDFS, there is a
>> master
>> > node containing metadata and a number of slave nodes (datanodes)
>> containing
>> > blocks, as in GFS. Clients start by talking to the master to list
>> > directories, etc. When they want to read a region of some file, they
>> tell
>> > the master the filename and offset, and they receive a list of block
>> > locations (datanodes). They then contact the individual datanodes to
>> read
>> > the blocks. When clients write a file, they first obtain a new block ID
>> and
>> > list of nodes to write it to from the master, then contact the datanodes
>> to
>> > write it (actually, the datanodes pipeline the write as in GFS) and
>> report
>> > when the write is complete. HDFS actually has some security mechanisms
>> built
>> > in, authenticating users based on their Unix ID and providing Unix-like
>> file
>> > permissions. I don't know much about how these are implemented, but they
>> > would be a good place to start looking.
>> >
>> > On Sun, Feb 15, 2009 at 1:36 PM, Amandeep Khurana <amansk@gmail.com
>> >wrote:
>> >
>> >> Thanks Matie
>> >>
>> >> I had gone through the architecture document online. I am currently
>> >> working
>> >> on a project towards Security in Hadoop. I do know how the data moves
>> >> around
>> >> in the GFS but wasnt sure how much of that does HDFS follow and how
>> >> different it is from GFS. Can you throw some light on that?
>> >>
>> >> Security would also involve the Map Reduce jobs following the same
>> >> protocols. Thats why the question about how does the Hadoop framework
>> >> integrate with the HDFS, and how different is it from Map Reduce and
>> GFS.
>> >> The GFS and Map Reduce papers give a good information on how those
>> systems
>> >> are designed but there is nothing that concrete for Hadoop that I have
>> >> been
>> >> able to find.
>> >>
>> >> Amandeep
>> >>
>> >>
>> >> Amandeep Khurana
>> >> Computer Science Graduate Student
>> >> University of California, Santa Cruz
>> >>
>> >>
>> >> On Sun, Feb 15, 2009 at 12:07 PM, Matei Zaharia <matei@cloudera.com>
>> >> wrote:
>> >>
>> >> > Hi Amandeep,
>> >> > Hadoop is definitely inspired by MapReduce/GFS and aims to provide
>> those
>> >> > capabilities as an open-source project. HDFS is similar to GFS (large
>> >> > blocks, replication, etc); some notable things missing are read-write
>> >> > support in the middle of a file (unlikely to be provided because few
>> >> Hadoop
>> >> > applications require it) and multiple appenders (the record append
>> >> > operation). You can read about HDFS architecture at
>> >> > http://hadoop.apache.org/core/docs/current/hdfs_design.html. The
>> >> MapReduce
>> >> > part of Hadoop interacts with HDFS in the same way that Google's
>> >> MapReduce
>> >> > interacts with GFS (shipping computation to the data), although
>> Hadoop
>> >> > MapReduce also supports running over other distributed filesystems.
>> >> >
>> >> > Matei
>> >> >
>> >> > On Sun, Feb 15, 2009 at 11:57 AM, Amandeep Khurana <amansk@gmail.com
>> >
>> >> > wrote:
>> >> >
>> >> > > Hi
>> >> > >
>> >> > > Is the HDFS architecture completely based on the Google Filesystem?
>> If
>> >> it
>> >> > > isnt, what are the differences between the two?
>> >> > >
>> >> > > Secondly, is the coupling between Hadoop and HDFS same as how
it is
>> >> > between
>> >> > > the Google's version of Map Reduce and GFS?
>> >> > >
>> >> > > Amandeep
>> >> > >
>> >> > >
>> >> > > Amandeep Khurana
>> >> > > Computer Science Graduate Student
>> >> > > University of California, Santa Cruz
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message