hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anuj maurice <anuj.maur...@gmail.com>
Subject Re: executing hadoop commands from python?
Date Sun, 17 Feb 2013 13:42:06 GMT
i was stuck with similar issue before and couldn't come up with a more
viable alternative than this so if the output of the hadoop command is not
that big then you can take it into your py script and process it .

i use the following code snippet to clean the output of ls and store it
into a py list for process.
In your case you can do a len on the list to get file count

fscommand  = "hadoop dfs -ls /path/in/%s/*/ 2> /dev/null"%("hdfs")
lines = hadoop_cmd.split("\n")[1:]
strlines =[map(lambda a:a.strip(),line.split(' ')[-3:]) for line in lines]

On Sun, Feb 17, 2013 at 4:17 AM, jamal sasha <jamalshasha@gmail.com> wrote:

> Hi,
>   This might be more of a python centric question but was wondering if
> anyone has tried it out...
> I am trying to run few hadoop commands from python program...
> For example if from command line, you do:
>       bin/hadoop dfs -ls /hdfs/query/path
> it returns all the files in the hdfs query path..
> So very similar to unix
> Now I am trying to basically do this from python.. and do some
> manipulation from it.
>      exec_str = "path/to/hadoop/bin/hadoop dfs -ls " + query_path
>      os.system(exec_str)
> Now, I am trying to grab this output to do some manipulation in it.
> For example.. count number of files?
> I looked into subprocess module but then... these are not native shell
> commands. hence not sure whether i can apply those concepts
> How to solve this?
> Thanks

regards ,
Anuj Maurice

View raw message