hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From s29752-hadoopu...@yahoo.com
Subject Re: distcp fails :Input source not found
Date Thu, 03 Apr 2008 18:33:55 GMT
distcp supports multiple sources (link Unix cp) and if the specified source is a directory,
it copies the entire directory.  So, you could either do
  distcp src1 src2 ... src100 dst
or
  first copy all srcs to srcdir, and then
  distcp srcdir dstdir

I have no experience on S3 and EC2.  Not sure it will work.

Nicholas


----- Original Message ----
From: Prasan Ary <voicesnthedark@yahoo.com>
To: core-user@hadoop.apache.org
Sent: Thursday, April 3, 2008 10:06:35 AM
Subject: Re: distcp fails :Input source not found

I found it was a slight oversight on my part. I was copying the files into S3 using Firefox
EC2 UI, and then trying to access those files on S3 using hadoop.  The S3 filesystem provided
by hadoop doesn't work with standard files. When I used hadoop to upload the files into S3
instead of Firefox EC2 UI, things sorted out.
   
  But then I had a hard time copying a whole folder from S3 onto EC2 cluster. The following
article suggests that "distcp" can be used to copy folder from S3 bucket onto EC2 hdfs :
  http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873
   
  However, when I try it on 0.15.3, it doesn't allow a folder copy. I have 100+ files in my
S3 bucket, and I had to run "distcp" on each one of them to get them on HDFS on EC2 . Not
a nice experience!
  Can anyone suggest more elegant way that we can transfer 100s of files from S3 to HDFS on
EC2 without having to iterate through each file?
   
   
  
s29752-hadoopuser@yahoo.com wrote:
  It might be a bug. Could you try the following?
bin/hadoop fs -ls s3://ID:SecretKey@randbucket123/InputFileFormat.xml

Nicholas


----- Original Message ----
From: Prasan Ary 
To: core-user@hadoop.apache.org
Sent: Wednesday, April 2, 2008 7:41:50 AM
Subject: Re: distcp fails :Input source not found

Anybody ? Any thoughts why this might be happening?

Here is what is happening directly from the ec2 screen. The ID and
Secret Key are the only things changed.

I'm running hadoop 15.3 from the public ami. I launched a 2 machine
cluster using the ec2 scripts in the src/contrib/ec2/bin . . .

The file I try and copy is 9KB (I noticed previous discussion on
empty files and files that are > 10MB)

>>>>> First I make sure that we can copy the file from s3
[root@domU-12-31-39-00-5A-35 hadoop-0.15.3]# bin/hadoop fs
-copyToLocal s3://ID:SecretKey@randbucket123/InputFileFormat.xml
/usr/InputFileFormat.xml

>>>>> Now I see that the file is copied to the ec2 master (where I'm
logged in)
[root@domU-12-31-39-00-5A-35 hadoop-0.15.3]# dir /usr/Input*
/usr/InputFileFormat.xml

>>>>> Next I make sure I can access the HDFS and that the input
directory is there
[root@domU-12-31-39-00-5A-35 hadoop-0.15.3]# bin/hadoop fs -ls /
Found 2 items
/input   2008-04-01 15:45
/mnt   2008-04-01 15:42
[root@domU-12-31-39-00-5A-35 hadoop-0.15.3]# bin/hadoop fs -ls
/input/
Found 0 items

>>>>> I make sure hadoop is running just fine by running an example
[root@domU-12-31-39-00-5A-35 hadoop-0.15.3]# bin/hadoop jar
hadoop-0.15.3-examples.jar pi 10 1000
Number of Maps = 10 Samples per Map = 1000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
08/04/01 17:38:14 INFO mapred.FileInputFormat: Total input paths to
process : 10
08/04/01 17:38:14 INFO mapred.JobClient: Running job:
job_200804011542_0001
08/04/01 17:38:15 INFO mapred.JobClient: map 0% reduce 0%
08/04/01 17:38:22 INFO mapred.JobClient: map 20% reduce 0%
08/04/01 17:38:24 INFO mapred.JobClient: map 30% reduce 0%
08/04/01 17:38:25 INFO mapred.JobClient: map 40% reduce 0%
08/04/01 17:38:27 INFO mapred.JobClient: map 50% reduce 0%
08/04/01 17:38:28 INFO mapred.JobClient: map 60% reduce 0%
08/04/01 17:38:31 INFO mapred.JobClient: map 80% reduce 0%
08/04/01 17:38:33 INFO mapred.JobClient: map 90% reduce 0%
08/04/01 17:38:34 INFO mapred.JobClient: map 100% reduce 0%
08/04/01 17:38:43 INFO mapred.JobClient: map 100% reduce 20%
08/04/01 17:38:44 INFO mapred.JobClient: map 100% reduce 100%
08/04/01 17:38:45 INFO mapred.JobClient: Job complete:
job_200804011542_0001
08/04/01 17:38:45 INFO mapred.JobClient: Counters: 9
08/04/01 17:38:45 INFO mapred.JobClient: Job Counters 
08/04/01 17:38:45 INFO mapred.JobClient: Launched map tasks=10
08/04/01 17:38:45 INFO mapred.JobClient: Launched reduce tasks=1
08/04/01 17:38:45 INFO mapred.JobClient: Data-local map tasks=10
08/04/01 17:38:45 INFO mapred.JobClient: Map-Reduce Framework
08/04/01 17:38:45 INFO mapred.JobClient: Map input records=10
08/04/01 17:38:45 INFO mapred.JobClient: Map output records=20
08/04/01 17:38:45 INFO mapred.JobClient: Map input bytes=240
08/04/01 17:38:45 INFO mapred.JobClient: Map output bytes=320
08/04/01 17:38:45 INFO mapred.JobClient: Reduce input groups=2
08/04/01 17:38:45 INFO mapred.JobClient: Reduce input records=20
Job Finished in 31.028 seconds
Estimated value of PI is 3.1556

>>>>> Finally, I try and copy the file over
[root@domU-12-31-39-00-5A-35 hadoop-0.15.3]# bin/hadoop distcp
s3://ID:SecretKey@randbucket123/InputFileFormat.xml
/input/InputFileFormat.xml
With failures, global counters are inaccurate; consider running with
-i
Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input
source s3://ID:SecretKey@randbucket123/InputFileFormat.xml does not
exist.
at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:470)
at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:550)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:563)



---------------------------------
You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No
Cost.




       
---------------------------------
You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No
Cost.



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message