hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: HTTP addressable files from HDFS?
Date Sat, 14 Mar 2009 04:45:54 GMT
wget http://namenode:port/*data/*filename
will return the filename.

The namenode will redirect the http request to a datanode that has at least
some of the blocks in local storage to serve the actual request.
The key piece of course is the /data prefix on the file name.
port is the port that the webgui is running on, NOT the HDFS port.
commonly the port is 50070.

On Fri, Mar 13, 2009 at 7:54 PM, David Michael <david.michael@gmail.com>wrote:

> Hello
>
> I realize that using HTTP, you can have a file in HDFS streamed - that is,
> the servlet responds to the following request with Content-Disposition:
> attachment, and a download is forced (at least from a browsers perspective)
> like so:
>
> http://localhost:50075/streamFile?filename=/somewhere/image.jpg
>
> Is there another way to get at this file more directly from HTTP 'out of
> the box'?
>
> I'm imagining something like:
>
> http://localhost:50075/somewhere/image.jpg
>
> Is this sort of exposure of the HDFS namespace something I need to write
> into a server myself?
>
> Thanks in advance
> David
>
> On Mar 13, 2009, at 10:12 PM, S D wrote:
>
>  I've used wget with Hadoop Streaming without any problems. Based on the
>> error code you're getting, I suggest you make sure that you have the
>> proper
>> write permissions for the directory in which Hadoop will process (e.g.,
>> download, convert, ...) on each of the task tracker machines. The location
>> where is processed on each machine is controlled by the "hadoop.tmp.dir"
>> variable. The default value set in $HADOOP_HOME/conf/hadoop-default.xml is
>> "/tmp/hadoop-${user.name}". Make sure that the user running hadoop has
>> permission to write to whatever directory you're using.
>>
>> John
>>
>> On Thu, Mar 12, 2009 at 10:02 PM, Nick Cen <cenyongh@gmail.com> wrote:
>>
>>  Hi All,
>>>
>>> I am trying to use the hadoop straeming with "wget" to simulate a
>>> distributed downloader.
>>> The command line i use is
>>>
>>> ./bin/hadoop jar -D mapred.reduce.tasks=0
>>> contrib/streaming/hadoop-0.19.0-streaming.jar -input urli -output urlo
>>> -mapper /usr/bin/wget -outputformat
>>> org.apache.hadoop.mapred.lib.MultipleTextOutputFormat
>>>
>>> But it thrown an exception
>>>
>>> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
>>> failed with code 1
>>>      at
>>>
>>> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:295)
>>>      at
>>>
>>> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:519)
>>>      at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
>>>      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
>>>      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>>>      at org.apache.hadoop.mapred.Child.main(Child.java:155)
>>>
>>> can somebody point me a way of why this happend. thanks.
>>>
>>>
>>>
>>> --
>>> http://daily.appspot.com/food/
>>>
>>>
>


-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message