hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milenko Petrovic <parbash.proj...@gmail.com>
Subject Re: ANNOUNCE: ParBASH 0.1 release - Hadoop through BASH
Date Tue, 21 Jul 2009 09:11:54 GMT
That's right Amr. ParBASH is a modified bash interpreter which 
translates certain bash constructs into hadoop steaming jobs. With very 
small changes a parbash script can be made to run on hadoop or locally, 
which I find very handy while developing map-reduce jobs.

Frequently, I need to do data cleaning, extraction and transformation on 
my input files before running my map-reduce jobs. I find it convenient 
to use shell tools (grep, awk, perl, etc) for this kind of thing. 
ParBASH tries to imitate unix shell-way of processing files, so I think 
it is a good fit for any scripting-based map-reduce job.

-Milenko

Amr Awadallah wrote:
> >  How does this related to [bashreduce]
>
> bashreduce has nothing to do with hadoop, it just implements a simple 
> version of the mapreduce framework using bash. It also doesn't have an 
> equivalent of HDFS (it relies on all the nodes having local copies or 
> access to a shared fs, or pass data using nc).
>
> ParBASH is a extension to bash which allows you to write a job as if 
> it was a bash script, and parbash takes care of translating that to a 
> hadoop streaming job (very clever)
>
> -- amr
>
> Robert Barta wrote:
>> On Mon, Jul 20, 2009 at 05:40:34PM +0200, Milenko Petrovic wrote:
>>  
>>> Hello,
>>>
>>> I'd like to announce the release of the 0.1 version of ParBASH. 
>>> Using  ParBASH, it is possible to write bash scripts that are 
>>> automatically  translated into Hadoop Streaming Jobs.
>>>     
>>
>> Milenko,
>>
>> How does this related to
>>
>>    http://md.devc.at/software/mapreduce/bashreduce
>>
>> ?
>>
>> \rho
>>   
>

Mime
View raw message