hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Forsberg <forsb...@opera.com>
Subject Copy files https -> HDFS
Date Tue, 07 Jul 2009 11:02:02 GMT

I have a list of files that reside on an https server (which require
authentication, either username/password or a client certificate), which
I want to copy into HDFS for later Map/Reduce processing. It's a bunch
of rather large files, so I'd like to do it in parallel.

I would guess this has been done before? Is there example code
anywhere? I can imagine creating a mapper-only job with a list of files
as input, but how do I easily write to HDFS from a mapper? 


View raw message