hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: how to read replicated blocks with hdfs api?
Date Fri, 07 Aug 2009 00:13:38 GMT
On Thu, Aug 6, 2009 at 1:20 PM, Harold Valdivia Garcia <
harold.valdivia@upr.edu> wrote:

> Hi... I was reading the HDFS code, and I can't find a way to read the
> replicated blocks of a block-file.
>
> DFS.getFileBlockLocations returns all blocks of a file
> File = block-a, block-b, ..... block-n.
>
> each of these blocks has its replicated blocks. if for instance the
> replication factor is 3, how can I retrieve block-a1, block-a2, block-a3 in
> parallel from my user code?
>
> I did read DFSClient, DFSClient.DFInputStream to understand how hadoop
> retrieves data from blocks, but it is hard.
> There is no an easy way to do this?


Correct - this is not a supported operation. People have discussed doing it,
but no one has put in the work to get it done. I think I may have
accidentally volunteered to do it at one point, but it hasn't been a
priority quite yet - it's an odd mapreduce job that can process data faster
than the datanode can serve it.

-Todd



>
> --
> ******************************************
> Harold Dwight Valdivia Garcia
> Graduate Student
> M.S Computer Engineering
> University of Puerto Rico, Mayaguez Campus
> ******************************************
>

Mime
View raw message