Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 87B2CE0F4 for ; Sun, 17 Feb 2013 14:09:53 +0000 (UTC) Received: (qmail 80761 invoked by uid 500); 17 Feb 2013 14:09:49 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 80374 invoked by uid 500); 17 Feb 2013 14:09:48 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 80347 invoked by uid 99); 17 Feb 2013 14:09:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Feb 2013 14:09:47 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.210.181 as permitted sender) Received: from [209.85.210.181] (HELO mail-ia0-f181.google.com) (209.85.210.181) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Feb 2013 14:09:41 +0000 Received: by mail-ia0-f181.google.com with SMTP id e16so4168771iaa.40 for ; Sun, 17 Feb 2013 06:09:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:x-gm-message-state; bh=R7QARa0CqBUyR0rli48pgHZto1Om4jS+K32ngvis0eU=; b=R7QdtTDEh2/WbdOWpX8jkCS88a9xsXlDys6zLEX9lRJ71hLWt8pVQxYpnupGYysdkQ cpoZvHBgZV1Mg9+QCTPS5nCm986Kt1ILpDFaFQ3LWHBIhV1qYwGEo5xwh+ygjW7CFa2Z glMP70aRzTwczHZpx8ND/Hlg9nlhBMX6asbA1jQvksToONeX2tPoDaElxVrxShP1FmPx Kca0jbXlEHBHnpbmuK8Pyv7bGKSveVBYk1BkDcochRyYTW1IaTOoQRq/V8Mecd+a/RJb mS8Mg4N7dC9CIdRJQZqHMuESGRaCGsTQHXqkzE/zFhtT/V/xR0Sv0HQTV5E36B4pGXvB eN1g== X-Received: by 10.50.191.228 with SMTP id hb4mr5003174igc.37.1361110161031; Sun, 17 Feb 2013 06:09:21 -0800 (PST) MIME-Version: 1.0 Received: by 10.50.104.229 with HTTP; Sun, 17 Feb 2013 06:09:00 -0800 (PST) In-Reply-To: References: From: Harsh J Date: Sun, 17 Feb 2013 19:39:00 +0530 Message-ID: Subject: Re: executing hadoop commands from python? To: "" Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQmXkR9PGjcHU/djwtTAKnJkR1DoWiZyHHXpygwxXAY7v8BZo6wK+1uGL26Ib/VN2wC8FSG+ X-Virus-Checked: Checked by ClamAV on apache.org Instead of 'scraping' this way, consider using a library such as Pydoop (http://pydoop.sourceforge.net) which provides pythonic ways and APIs to interact with Hadoop components. There are also other libraries covered at http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/ for example. On Sun, Feb 17, 2013 at 4:17 AM, jamal sasha wrote: > Hi, > > This might be more of a python centric question but was wondering if > anyone has tried it out... > > I am trying to run few hadoop commands from python program... > > For example if from command line, you do: > > bin/hadoop dfs -ls /hdfs/query/path > > it returns all the files in the hdfs query path.. > So very similar to unix > > > Now I am trying to basically do this from python.. and do some manipulation > from it. > > exec_str = "path/to/hadoop/bin/hadoop dfs -ls " + query_path > os.system(exec_str) > > Now, I am trying to grab this output to do some manipulation in it. > For example.. count number of files? > I looked into subprocess module but then... these are not native shell > commands. hence not sure whether i can apply those concepts > How to solve this? > > Thanks > > -- Harsh J