hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: DFS and the RecordReader
Date Thu, 06 Dec 2012 22:15:19 GMT
Hi,

Not sure what you're talking about. RecordReaders, or for that matter,
any DFS InputStream, does not pull data locally before reading it.
Non-data-local reads are streamed over the network like how regular
data local reads are streamed over a local disk.

There is no such logic as the one you seek.

On Fri, Dec 7, 2012 at 3:07 AM, Jay Vyas <jayunit100@gmail.com> wrote:
> Hi guys:
>
> Where and how does a Hadoop's record reader decide wether or not it needs to
> copy a file to local disk ?
>
> Clearly, since the InputSplit (which has meta data about file inputs) is the
> input to the RecordReader, the RecordReader would have to implement some
> kind of smart decision making ... Im looking for something like
>
> //Psuedocode
> if(! file.existsLocally())
>    copyFileToDisk(filegetPath());
>
> return new InputStream(file);
>
> I've looked here:
>
> http://grepcode.com/file/repo1.maven.org/maven2/org.jvnet.hudson.hadoop/hadoop-core/0.19.1-hudson-2/org/apache/hadoop/hdfs/DFSClient.java#DFSClient.create%28java.lang.String%2Corg.apache.hadoop.fs.permission.FsPermission%2Cboolean%2Cshort%2Clong%2Corg.apache.hadoop.util.Progressable%2Cint%29
>
> but don't see anything.
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com



-- 
Harsh J

Mime
View raw message