hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Setting input paths
Date Wed, 06 Apr 2011 15:41:54 GMT
I believe that opening a directory as a file will result in a file not found.  You probably
need to set it to a glob, that points to that actual files.  Something like

/user/root/logs/2011/*/*/* for all entries in 2011, or  /user/root/logs/2011/01/*/* if you
want to restrict it to just January.  By default if you pass in a directory as input the input
format will assume that the directory contains only files, no sub directories and that you
really want to use each of those files an input.

--Bobby Evans

On 4/6/11 9:53 AM, "Mark" <static.void.dev@gmail.com> wrote:

How can I tell my job to include all the subdirectories and their
content of a certain path?

My directory structure is as follows: logs/{YEAR}/{MONTH}/{DAY}  and I
tried setting my input path to 'logs/' using
FileInputFormat.addInputPath however I keep receiving the following error:

ava.io.FileNotFoundException: File does not exist: /user/root/logs/2011/01
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1586)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1577)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:428)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
        at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:450)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
        at org.apache.hadoop.mapred.Child.main(Child.java:234


Do my directories/directory contents need to be in any particular format?

Thanks


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message