hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "GOEKE, MATTHEW (AG/1000)" <matthew.go...@monsanto.com>
Subject RE: Distributed cache not working
Date Mon, 18 Jul 2011 22:17:01 GMT
Somehow I forgot to add the stack trace. Anything with <> around it is just things I
have substituted out for privacy reasons :)

2011-07-18 15:59:38,008 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.FileNotFoundException: File does not exist: /sdb1/mapred/taskTracker/<username>/distcache/4832436332923632778_998360924_1057744747/<node
IP>/tmp/hadoop-tmpdir/mapred/staging/<username>/.staging/job_201107151351_0015/files/joinData.txt
                at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1602)
                at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1593)
                at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:428)
                at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
                at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
                at <package>.SubsetOfIndivComparisonsAcrossMultipleMarkersSequenceFileMapper.setup(SubsetOfIndivComparisonsAcrossMultipleMarkersSequenceFileMapper.java:32)
                at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
                at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
                at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:396)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
                at org.apache.hadoop.mapred.Child.main(Child.java:262)

From: GOEKE, MATTHEW [AG/1000]
Sent: Monday, July 18, 2011 4:54 PM
To: mapreduce-user@hadoop.apache.org
Cc: GOEKE, MATTHEW [AG/1000]
Subject: Distributed cache not working

All,

I cannot confirm if this is an issue with my code / usage or if I am actually running into
a framework issue. I just ran a job that uses the exact same method and it works perfectly
which makes me think I am missing something minor.

I have Tool implemented for my main class and I am using -files on the command line to pass
it a local URI. I use the code below to access the file in the setup method of the mapper:

...
    try{
      Path[] cacheFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());
      if (cacheFiles != null && cacheFiles.length > 0){
        System.out.println("Reading data from DistributedCache: " + cacheFiles[0].toString()
+ "\n");

        FileSystem fs = FileSystem.get(context.getConfiguration());
        FSDataInputStream scoresInputStream = fs.open(cacheFiles[0]);
...

The only thing I can think of is the file size is substantially different (6MB vs 90MB) but
with the network backend we have it should be less than a 2 second difference between the
file being there or not. Is it possible that the file has not propagated by the time it is
attempted to be accessed and are there hooks in there to not start until files have been fully
distributed?

Matt
This e-mail message may contain privileged and/or confidential information, and is intended
to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please notify the
sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of this e-mail
by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, reading and archival
by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence
of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such
code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control laws and regulations
of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and sanctions regulations
issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information you
are obligated to comply with all
applicable U.S. export laws and regulations.

Mime
View raw message