hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <jason.had...@gmail.com>
Subject Re: Reading files from local file system
Date Wed, 14 Oct 2009 06:04:45 GMT
If you want to open a local file in hadoop you have 3 simple ways

1: use file:///path
2: get a LocalFileSystem object from the FileSystem
/**
   * Get the local file syste
   * @param conf the configuration to configure the file system with
   * @return a LocalFileSystem
   */
  public static LocalFileSystem getLocal(Configuration conf)
    throws IOException {
    return (LocalFileSystem)get(LocalFileSystem.NAME, conf);
  }

3: use the java.io File* classes.

On Tue, Oct 13, 2009 at 9:05 AM, Chandan Tamrakar <
chandan.tamrakar@nepasoft.com> wrote:

> Do I need to change any configuration beside changing the default file
> system to "local file system' ?
> I am trying to input for example  input.txt to map job
>
> input.txt will contain file location as following
>
> file://path/abc1.doc
> file://path/abc2.doc
> ..
> ...
>
> map program will read each line from input.txt and process them
>
> Do i need to change any configuration ? This is similar to how Nutch crawls
> .
>
> any feedbacks would be appreciated
>
> thanks
>
>
>
> On Tue, Oct 13, 2009 at 6:49 AM, Jeff Zhang <zjffdu@gmail.com> wrote:
>
> > Maybe you could debug your mapreduce job in eclipse, since you run it in
> > local mode.
> >
> >
> >
> > On Tue, Oct 13, 2009 at 5:56 AM, Chandan Tamrakar <
> > chandan.tamrakar@nepasoft.com> wrote:
> >
> > >
> > >
> > > We are trying to read files from local file system. But when running
> the
> > > map
> > > reduce it is not able to read files from the input location (the input
> > > location is also local file system location).
> > >
> > > For this we changed the configuration of the hadoop-site.xml as shown
> > > below:
> > >
> > > /etc/conf/hadoop/hadoop-site.xml
> > >
> > > <property>
> > >    <name>fs.default.name</name>
> > >    <value>file:///</value>
> > >  </property>
> > >
> > >
> > >  [admin@localhost ~]$ hadoop jar Test.jar /home/admin/input/test.txt
> > > output1
> > >
> > > Suppose Test.txt is pain text file that contains
> > > Test1
> > > Test2
> > > Test3
> > >
> > >
> > > While running simple MapReduce job we get following exception  "File
> not
> > > found exception " , we are using TextInputFormat in our Job
> configuration
> > >
> > >
> > > 09/10/13 17:26:35 WARN mapred.JobClient: Use GenericOptionsParser for
> > > parsing the arguments. Applications should implement Tool for the same.
> > > 09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to
> > process
> > > : 1
> > > 09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to
> > process
> > > : 1
> > > 09/10/13 17:26:37 INFO mapred.JobClient: Running job:
> > job_200910131447_0033
> > > 09/10/13 17:26:38 INFO mapred.JobClient:  map 0% reduce 0%
> > > 09/10/13 17:27:00 INFO mapred.JobClient: Task Id :
> > > attempt_200910131447_0033_m_000000_0, Status : FAILED
> > > java.io.FileNotFoundException: File
> > file:/home/admin/Desktop/input/test.txt
> > > does not exist.
> > >        at
> > >
> > >
> >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.jav
> > > a:420)
> > >        at
> > >
> > >
> >
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:25
> > > 9)
> > >        at
> > >
> > >
> >
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(Checks
> > > umFileSystem.java:117)
> > >        at
> > >
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:275)
> > >        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:364)
> > >        at
> > >
> >
> org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:206)
> > >        at
> > >
> > >
> >
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.jav
> > > a:50)
> > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> > >        at
> > > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
> > >
> > > However, running in the code as a separate Main method does work well.
> > >
> > > public static void main (String [] args) throws IOException {
> > >
> > >     Configuration conf = new Configuration();
> > >     FileSystem fs = FileSystem.get(conf);
> > >
> > >     Path filenamePath = new Path(theFilename);
> > >     FSDataOutputStream out = fs.create(new Path("abc.txt"));
> > >     out.writeUTF("abc");
> > >     out.close();
> > >
> > > }
> > >
> > > The above code works fine when running it as a jar in hadoop. The above
> > > code
> > > successfully creates file in /home/admin/abc.txt when running from
> admin
> > > user.
> > >
> > >
> >
>
>
>
> --
> Chandan Tamrakar
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message