hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Faris <afa...@linkedin.com>
Subject Re: RES: I want to call HDFS REST api to upload a file using httplib.
Date Wed, 10 Apr 2013 22:39:32 GMT
Creating a file on HDFS is a multi-step process. If you allow me to generalize and skip over
a lot of details, it's essentially a two step process.    1) ask the namenode for a location
to write the blocks.   2) connect to the datanode and write your data.   The output from your
curl statement is the response from the namenode, which returns a 307 and a location.   Your
client, (curl) is supposed to say hey I have a new location and connect to the data node to
write the data.   If you add -L to your curl request, you'll see this happening.   

Just as a FYI, an example of using httplib for webhdfs is a solved problem.  You have your
pick of languages on github that do this already.  :)  

https://github.com/search?q=webhdfs&type=Repositories&s=updated    

-- Adam

On Apr 9, 2013, at 8:32 AM, Daryn Sharp <daryn@yahoo-inc.com> wrote:

> Try adding -L to your curl and see if that works.
> 
> Daryn
> 
> On Apr 8, 2013, at 11:05 PM, 小学园PHP wrote:
> 
>> Really Thanks.
>> But the returned URL is wrong. And the localhost is the real URL, as i tested successfully
with curl using "localhost".
>> Can anybody help me translate the curl to Python httplib?
>> curl -i -X PUT -T <LOCAL_FILE> "http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE"
>> I test it using python httplib, and receive the right response. But the file uploaded
to HDFS is empty, no data sent!!
>> Is "conn.send(data)"  the problem?
>> 
>> ------------------ Original ------------------
>> From:  "MARCOS MEDRADO RUBINELLI"<marcosm@buscapecompany.com>;
>> Date:  Mon, Apr 8, 2013 04:22 PM
>> To:  "user@hadoop.apache.org"<user@hadoop.apache.org>;
>> Subject:  RES: I want to call HDFS REST api to upload a file using httplib.
>> 
>> On your first call, Hadoop will return a URL pointing to a datanode in the Location
header of the 307 response. On your second call, you have to use that URL instead of constructing
your own. You can see the specific documentation here:
>> http://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE
>> 
>> Regards,
>> Marcos
>> 
>> I want to call HDFS REST api to upload a file using httplib.
>> 
>> My program created the file, but no content is in it.
>> 
>> =====================================================
>> 
>> Here is my code:
>> 
>> import
>>  httplib
>> 
>> conn
>> =httplib.HTTPConnection("localhost:50070")
>> 
>> conn
>> .request("PUT","/webhdfs/v1/levi/4?op=CREATE")
>> 
>> res
>> =conn.getresponse()
>> print res.status,res.
>> reason
>> conn
>> .close()
>> 
>> 
>> conn
>> =httplib.HTTPConnection("localhost:50075")
>> 
>> conn
>> .connect()
>> 
>> conn
>> .putrequest("PUT","/webhdfs/v1/levi/4?op=CREATE&user.name=levi")
>> 
>> conn
>> .endheaders()
>> 
>> a_file
>> =open("/home/levi/4","rb")
>> 
>> a_file
>> .seek(0)
>> 
>> data
>> =a_file.read()
>> 
>> conn
>> .send(data)
>> 
>> res
>> =conn.getresponse()
>> print res.status,res.
>> reason
>> conn
>> .close()
>> ==================================================
>> 
>> Here is the return:
>> 
>> 307 TEMPORARY_REDIRECT 201 Created
>> 
>> =========================================================
>> 
>> OK, the file was created, but no content was sent.
>> 
>> When I comment the #conn.send(data), the result is the same, still no content.
>> 
>> Maybe the file read or the send is wrong, not sure.
>> 
>> Do you know how this happened?
>> 
> 


Mime
View raw message