hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <kbwi...@u.washington.edu>
Subject C++ pipes on full (nonpseudo) cluster
Date Tue, 30 Mar 2010 16:10:41 GMT
I'm confused as to how to run a C++ pipes program on a full HDFS system.  I have everything
working in pseudo-distributed mode so that's a good start...but I can't figure out the full
cluster mode.

As I see it, there are two basic approaches: upload the executable directly to HDFS or specify
it when you run pipes and have it distributed to the cluster at the time the job is run.

In the former case, which mirrors the documentation for the pseudo-distributed example, I
am totally perplexed because HDFS doesn't support executable permissions on any files.  In
other words, the word count example for the pseudo-distributed case absolutely will not carry
over to the fully distributed case since that example consists of first transferring the file
to the cluster.  When I do that and run pipes I get a permissions error on the file because
it isn't executable (and chmod refuses to enable 'x' on HDFS).

So that leaves the latter case.  I specify the executable to pipes using the -program option,
but then it never gets found.  I get file not found errors for the executable.

I've tried the following and a few variants to no avail:

% hadoop pipes -files LOCALPATH/EXE EXE -input HDFSPATH/input -output HDFSPATH/output -program

% hadoop pipes -input HDFSPATH/input -output HDFSPATH/output -program LOCALPATH/EXE

Do anyone know how to get this this working?


Keith Wiley               kwiley@keithwiley.com               www.keithwiley.com

"Luminous beings are we, not this crude matter."
  -- Yoda

View raw message