hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Hadoop Pipes Error
Date Thu, 31 Mar 2011 09:54:16 GMT
On 31/03/11 07:53, Adarsh Sharma wrote:
> Thanks Amareshwari,
>
> here is the posting :
> The *nopipe* example needs more documentation. It assumes that it is run
> with the InputFormat from src/test/org/apache/*hadoop*/mapred/*pipes*/
> *WordCountInputFormat*.java, which has a very specific input split
> format. By running with a TextInputFormat, it will send binary bytes as
> the input split and won't work right. The *nopipe* example should
> probably be recoded *to* use libhdfs *too*, but that is more complicated
> *to* get running as a unit test. Also note that since the C++ example is
> using local file reads, it will only work on a cluster if you have nfs
> or something working across the cluster.
>
> Please need if I'm wrong.
>
> I need to run it with TextInputFormat.
>
> If posiible Please explain the above post more clearly.


Here goes.

1.
 > The *nopipe* example needs more documentation. It assumes that it is run
 > with the InputFormat from src/test/org/apache/*hadoop*/mapred/*pipes*/
 > *WordCountInputFormat*.java, which has a very specific input split
 > format. By running with a TextInputFormat, it will send binary bytes as
 > the input split and won't work right.

The input for the pipe is the content generated by
src/test/org/apache/hadoop/mapred/pipes/WordCountInputFormat.java

This is covered here.
http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Example%3A+WordCount+v1.0

I would recommend following the tutorial here, or either of the books 
"Hadoop the definitive guide" or "Hadoop in Action". Both authors earn 
their money by explaining how to use Hadoop, which is why both books are 
good explanations of it.

2.
 >The *nopipe* example should
 > probably be recoded *to* use libhdfs *too*, but that is more complicated
 > *to* get running as a unit test.

Ignore that -it's irrelevant for your problem as owen is discussing 
automated testing.

3.

 > Also note that since the C++ example is
 > using local file reads, it will only work on a cluster if you have nfs
 > or something working across the cluster.

unless your cluster has a shared filesystem at the OS level it won't 
work. Either have a shared filesystem like NFS, or run it on a single 
machine.

-Steve





Mime
View raw message