hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Derek Young <dyo...@kayak.com>
Subject using distcp for http source files
Date Wed, 21 Jan 2009 21:23:56 GMT
I plan to use hadoop to do some log processing and I'm working on a 
method to load the files (probably nightly) into hdfs.  My plan is to 
have a web server on each machine with logs that serves up the log 
directories.  Then I would give distcp a list of http URLs of the log 
files and have it copy the files in.

Reading http://issues.apache.org/jira/browse/HADOOP-341 it sounds like 
this should be supported, but the http URLs are not working for me.  Are 
http source URLs still supported?

I tried a simple test with an http source URL (using Hadoop 0.19):

hadoop distcp -f http://core:7274/logs/log.20090121 /user/dyoung/mylogs

This fails:

With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: No FileSystem for scheme: http
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at org.apache.hadoop.tools.DistCp.fetchFileList(DistCp.java:578)
    at org.apache.hadoop.tools.DistCp.access$300(DistCp.java:74)
    at org.apache.hadoop.tools.DistCp$Arguments.valueOf(DistCp.java:775)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:844)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:871)

View raw message