Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates
 209.85.210.177 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAFWc0y34+dN6APk6i1fWaRYOTNT7O5CQM70amFEovi_UcXh7ug@mail.gmail.com>
References: 
 <CAFWc0y34+dN6APk6i1fWaRYOTNT7O5CQM70amFEovi_UcXh7ug@mail.gmail.com>
From: Harsh J <harsh@cloudera.com>
Date: Sun, 17 Mar 2013 15:20:34 +0530
Message-ID: 
 <CAOcnVr3_m0b55AO6OyEVqNvRQZNQb_Art5BQWunJzF3eeHqupA@mail.gmail.com>
Subject: Re: executing files on hdfs via hadoop not possible? is JNI/JNA a
 reasonable solution?
To: "<user@hadoop.apache.org>" <user@hadoop.apache.org>
Content-Type: text/plain; charset=ISO-8859-1

You're confusing two things here. HDFS is a data storage filesystem.
MR does not have anything to do with HDFS (generally speaking).

A reducer runs as a regular JVM on a provided node, and can execute
any program you'd like it to by downloading it onto its configured
local filesystem and executing it.

If your goal is merely to run a regular program over data that is
sitting in HDFS, that can be achieved. If your library is in C then
simply use a streaming program to run it and use libhdfs' HDFS API
(C/C++) to read data into your functions from HDFS files. Would this
not suffice?

On Sun, Mar 17, 2013 at 3:09 PM, Julian Bui <julianbui@gmail.com> wrote:
> Hi hadoop users,
>
> I just want to verify that there is no way to put a binary on HDFS and
> execute it using the hadoop java api.  If not, I would appreciate advice in
> getting in creating an implementation that uses native libraries.
>
> "In contrast to the POSIX model, there are no sticky, setuid or setgid bits
> for files as there is no notion of executable files."  Is there no
> workaround?
>
> A little bit more about what I'm trying to do.  I have a binary that
> converts my image to another image format.  I currently want to put it in
> the distributed cache and tell the reducer to execute the binary on the data
> on hdfs.  However, since I can't set the execute permission bit on that
> file, it seems that I cannot do that.
>
> Since I cannot use the binary, it seems like I have to use my own
> implementation to do this.  The challenge is that these libraries that I can
> use to do this are .a and .so files.  Would I have to use JNI and package
> the libraries in the distributed cache and then have the reducer find and
> use those libraries on the task nodes?  Actually, I wouldn't want to use
> JNI, I'd probably want to use java native access (JNA) to do this.  Has
> anyone used JNA with hadoop and been successful?  Are there problems I'll
> encounter?
>
> Please let me know.
>
> Thanks,
> -Julian


--
Harsh J