hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit...@gmail.com>
Subject Re: understanding souce code structure
Date Mon, 27 May 2013 18:22:42 GMT
Hi!  a few weeks ago I had the same question... Tried a first iteration at documenting this
by going through the classes starting with key/value pairs in the blog post below.  

http://jayunit100.blogspot.com/2013/04/the-kv-pair-salmon-run-in-mapreduce-hdfs.html

Note it's not perfect yet but I think it should provide some insight into things.  The lynch
pin of it all is the DFSOutputStream and the DataStreamer classes.   Anyways... Feel free
to borrow the contents and roll your own , or comment on it & leave some feedback,or let
me know if anything is missing.   

Definetly would be awesome to have a rock solid view of the full write path.

On May 27, 2013, at 2:10 PM, Mahmood Naderan <nt_mahmood@yahoo.com> wrote:

> Hello
> 
> I am trying to understand the source of of hadoop especially the HDFS. I want to know
where should I look exactly in the source code about how HDFS distributes the data. Also how
the map reduce engine tries to read the data. 
> 
> 
> Any hint regarding the location of those in the source code is appreciated.
>  
> Regards,
> Mahmood

Mime
View raw message