hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Bui <julian...@gmail.com>
Subject Re: executing files on hdfs via hadoop not possible? is JNI/JNA a reasonable solution?
Date Sun, 17 Mar 2013 10:50:01 GMT
Hello Harsh,

Thanks for the reply.  I just want to verify that I understand your

It sounds like you're saying I should write a c/c++ application and get
access to hdfs using libhdfs.  What I'm a little confused about is what you
mean by "use a streaming program".  Do you mean I should use the hadoop
streaming interface to call some native binary that I wrote?  I was not
even aware that the streaming interface could execute native binaries.  I
thought that anything using the hadoop streaming interface only interacts
with stdin and stdout and cannot make modifications to the hdfs.  Or did
you mean that I should use hadoop pipes to write a c/c++ application?

Anyway, I hope that you can help me clear things up in my head.


On Sun, Mar 17, 2013 at 2:50 AM, Harsh J <harsh@cloudera.com> wrote:

> You're confusing two things here. HDFS is a data storage filesystem.
> MR does not have anything to do with HDFS (generally speaking).
> A reducer runs as a regular JVM on a provided node, and can execute
> any program you'd like it to by downloading it onto its configured
> local filesystem and executing it.
> If your goal is merely to run a regular program over data that is
> sitting in HDFS, that can be achieved. If your library is in C then
> simply use a streaming program to run it and use libhdfs' HDFS API
> (C/C++) to read data into your functions from HDFS files. Would this
> not suffice?
> On Sun, Mar 17, 2013 at 3:09 PM, Julian Bui <julianbui@gmail.com> wrote:
> > Hi hadoop users,
> >
> > I just want to verify that there is no way to put a binary on HDFS and
> > execute it using the hadoop java api.  If not, I would appreciate advice
> in
> > getting in creating an implementation that uses native libraries.
> >
> > "In contrast to the POSIX model, there are no sticky, setuid or setgid
> bits
> > for files as there is no notion of executable files."  Is there no
> > workaround?
> >
> > A little bit more about what I'm trying to do.  I have a binary that
> > converts my image to another image format.  I currently want to put it in
> > the distributed cache and tell the reducer to execute the binary on the
> data
> > on hdfs.  However, since I can't set the execute permission bit on that
> > file, it seems that I cannot do that.
> >
> > Since I cannot use the binary, it seems like I have to use my own
> > implementation to do this.  The challenge is that these libraries that I
> can
> > use to do this are .a and .so files.  Would I have to use JNI and package
> > the libraries in the distributed cache and then have the reducer find and
> > use those libraries on the task nodes?  Actually, I wouldn't want to use
> > JNI, I'd probably want to use java native access (JNA) to do this.  Has
> > anyone used JNA with hadoop and been successful?  Are there problems I'll
> > encounter?
> >
> > Please let me know.
> >
> > Thanks,
> > -Julian
> --
> Harsh J

View raw message