hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George <keepp...@gmail.com>
Subject Re: Need help understanding Hadoop Architecture
Date Mon, 24 Oct 2011 20:18:59 GMT
To all

I have been following this board for the past few weeks, and the information
has been great - so I appreciate the amount of sharing that has been going
on

I am in the "newbie" category here - so there is something I need some
guidance on.   I think I have a basic understanding of  HDFS and how data is
loaded into HDFS.

What I haven't figured out just yet - how do you organize the "data" ?  I
know how you do it with a relational database - but I have read that Yahoo
has installations with more than 60 Million files.

At the end of the day, you need SOME idea of what you are accessing, don't
you ?   Anything that talks to the organization of data in HDFS and the
approach of querying against it would be very helpful

Thanks in advance !




On Mon, Oct 24, 2011 at 12:26 PM, Anupam Seth <anupams@yahoo-inc.com> wrote:

> Hi Mike,
>
> This might help address your question:
>
> http://storageconference.org/2010/Papers/MSST/Shvachko.pdf
>
> Regards,
> Anupam
>
> -----Original Message-----
> From: panamamike [mailto:panamamike@hotmail.com]
> Sent: Sunday, October 23, 2011 9:59 AM
> To: core-user@hadoop.apache.org
> Subject: Need help understanding Hadoop Architecture
>
>
> I'm new to Hadoop.  I've read a few articles and presentations which are
> directed at explaining what Hadoop is, and how it works.  Currently my
> understanding is Hadoop is an MPP system which leverages the use of large
> block size to quickly find data.  In theory, I understand how a large block
> size along with an MPP architecture as well as using what I'm understanding
> to be a massive index scheme via mapreduce can be used to find data.
>
> What I don't understand is how ,after you identify the appropriate 64MB
> blocksize, do you find the data you're specifically after?  Does this mean
> the CPU has to search the entire 64MB block for the data of interest?  If
> so, how does Hadoop know what data from that block to retrieve?
>
> I'm assuming the block is probably composed of one or more files.  If not,
> I'm assuming the user isn't look for the entire 64MB block rather a portion
> of it.
>
> Any help indicating documentation, books, articles on the subject would be
> much appreciated.
>
> Regards,
>
> Mike
> --
> View this message in context:
> http://old.nabble.com/Need-help-understanding-Hadoop-Architecture-tp32705405p32705405.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message