hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Question about HDFS Architecture
Date Tue, 25 Aug 2009 05:21:24 GMT
On Mon, Aug 24, 2009 at 9:57 PM, Harold Lim <rold_50@yahoo.com> wrote:

> Hi Todd,
>
> Yes. My question is about multiple re-opens. For example, I have an
> application that reads/fetches a file depending on what a user chooses. So,
> in this case, there is no location caching?
>

Correct. But the getBlockLocations call is very fast - it only hits the
namenode, and the namenode has the data in RAM.

-Todd


>
>
>
> Thanks,
> Harold
>
>
>
>
> --- On Tue, 8/25/09, Todd Lipcon <todd@cloudera.com> wrote:
>
> > From: Todd Lipcon <todd@cloudera.com>
> > Subject: Re: Question about HDFS Architecture
> > To: hdfs-user@hadoop.apache.org
> > Date: Tuesday, August 25, 2009, 12:43 AM
> > On Mon, Aug 24, 2009 at 6:40 PM, Konstantin
> > Shvachko <shv@yahoo-inc.com>
> > wrote:
> >
> >
> > Harold,
> >
> >
> >
> > Both answers by Aaron were incorrect.
> >
> >
> >
> > > Does the client cache this information, or does it
> > always talk to the namenode first?
> >
> >
> >
> > Yes, the client caches replica locations received from the
> > name-node.
> >
> > On open() it receives locations of the first 10 blocks of
> > the file.
> >
> > In most cases these are all file blocks. If not then the
> > client will
> >
> > get another portion of blocks when needed, and will also
> > cache them.
> > This is only within a single DFSInputStream. The
> > block location cache does not persist across re-opens of the
> > same file. As I read the original question, it was about
> > longer-term caching, not just keeping state during a single
> > DFSInputStream.
> >
> >
> > -Todd
> >
> >
>
>
>
>

Mime
View raw message