Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 44A0BE636 for ; Sun, 17 Mar 2013 09:51:27 +0000 (UTC) Received: (qmail 48027 invoked by uid 500); 17 Mar 2013 09:51:22 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 47746 invoked by uid 500); 17 Mar 2013 09:51:21 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 47710 invoked by uid 99); 17 Mar 2013 09:51:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Mar 2013 09:51:20 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.210.177 as permitted sender) Received: from [209.85.210.177] (HELO mail-ia0-f177.google.com) (209.85.210.177) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Mar 2013 09:51:15 +0000 Received: by mail-ia0-f177.google.com with SMTP id y25so4435043iay.22 for ; Sun, 17 Mar 2013 02:50:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:x-gm-message-state; bh=sp1WqTQjb2JtKcDTdLdJw6TpIqFK+Q1roxctuheDzVw=; b=LP3YfDfIJ6OxnErvKmqVUinyLge22FhqBLemHv0071+JH/3JAr4gFB4SXipP1VHqv8 44SqeKayBC4v2W8DXGcH8w1RC1g+trrjKqZviU8HLCYyO1NNdqoNNs1zOghagO5rq9iZ sV1aBNvlMlad2hXQRAZqeDt4/ZMoEQlQNSCo6315K8P8WY3FUrLtOSN8K9M8vj/xRcBx nJGQ6w2S9B8l0rsDNaHcssSlV8BEuNFvFrZ/lFGHHgW+qbBLihNlyV1NH1sESe3J9KGL J3IhatIJz9krvK3kN39UIVhMOmRNQQA7zMQk7uxKsZgvbO30c0AW+SpzNgvl8EpwrInM d5hg== X-Received: by 10.50.191.228 with SMTP id hb4mr4358833igc.37.1363513854801; Sun, 17 Mar 2013 02:50:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.181.198 with HTTP; Sun, 17 Mar 2013 02:50:34 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Sun, 17 Mar 2013 15:20:34 +0530 Message-ID: Subject: Re: executing files on hdfs via hadoop not possible? is JNI/JNA a reasonable solution? To: "" Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQlTkNU0S2JTkbtZUi/uy6FU4ve87XXtqPiwAeew3v9okh1wh7EjwoUgg8t4npM7L0t1Vhpx X-Virus-Checked: Checked by ClamAV on apache.org You're confusing two things here. HDFS is a data storage filesystem. MR does not have anything to do with HDFS (generally speaking). A reducer runs as a regular JVM on a provided node, and can execute any program you'd like it to by downloading it onto its configured local filesystem and executing it. If your goal is merely to run a regular program over data that is sitting in HDFS, that can be achieved. If your library is in C then simply use a streaming program to run it and use libhdfs' HDFS API (C/C++) to read data into your functions from HDFS files. Would this not suffice? On Sun, Mar 17, 2013 at 3:09 PM, Julian Bui wrote: > Hi hadoop users, > > I just want to verify that there is no way to put a binary on HDFS and > execute it using the hadoop java api. If not, I would appreciate advice in > getting in creating an implementation that uses native libraries. > > "In contrast to the POSIX model, there are no sticky, setuid or setgid bits > for files as there is no notion of executable files." Is there no > workaround? > > A little bit more about what I'm trying to do. I have a binary that > converts my image to another image format. I currently want to put it in > the distributed cache and tell the reducer to execute the binary on the data > on hdfs. However, since I can't set the execute permission bit on that > file, it seems that I cannot do that. > > Since I cannot use the binary, it seems like I have to use my own > implementation to do this. The challenge is that these libraries that I can > use to do this are .a and .so files. Would I have to use JNI and package > the libraries in the distributed cache and then have the reducer find and > use those libraries on the task nodes? Actually, I wouldn't want to use > JNI, I'd probably want to use java native access (JNA) to do this. Has > anyone used JNA with hadoop and been successful? Are there problems I'll > encounter? > > Please let me know. > > Thanks, > -Julian -- Harsh J