hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Boudnik <...@yahoo-inc.com>
Subject Re: running hadoop pipes locally (debugging / unit testing / gdb / c++)
Date Fri, 13 Nov 2009 21:37:58 GMT
I'd suggest you start from http://wiki.apache.org/hadoop/HowToContribute

--
Take care,
   Cos

On 11/12/09 15:24 , Erez Katz wrote:
> Greetings,
>
> I have written a simple yet pretty handy framework for debugging hadoop pipes programs
locally.
> It is called GaDooB  ... combination of GDB and Hadoop :) .
>
> It helps debugging/unit testing c++ hadoop map-reduce programs that were built using
hadoop pipes.
>
> It is basically a sequencer that reads input text files and feeds them to a mapper, collects
the output and feeds it to a reducer. It also handles the usage of a combiner, partitioner
and multiple reducers.
>
> All the code is in header files. There are no libraries to link with and there is no
change to the build process (besides maybe an extra include path).
>
> I kept the dependencies to a bare minimum, by only using basic stl collections (map/vector/string)
and basic io, which are used in hadoop pipes anyway.
>
>
> For example, let say this is the main function of a pipes map reduce program:
>
> int main (int argc, char* argv[])
> {
> return HadoopPipes::runTask(
>       HadoopPipes::TemplateFactory<  MyMapper,MyReducer>  ());
> }
>
>
> Then the locally-run version would look like:
>
> int main (int argc, char* argv[])
> {
>      if ((argc>=2)&&  (strcmp(argv[1],"debugMeLocally")==0))
>      {
>          std::map<std::string,std::string>  confMap;
>          SimpleConfReader().readConf("./my_jobconf.xml" , confMap);
>
>          confMap["extraParam"]="extraValue";
>
>          string inputFile  = "/tmp/mpj.txt";
>          string outputFile = "/tmp/out1.txt";
>
>          GaDooBSequencer::runTaskLocally<MyMapper,MyReducer>(confMap,inputFile
, outputFile);
>
>          return 0;
>      }
>
>     return HadoopPipes::runTask(
>       HadoopPipes::TemplateFactory<  MyMapper,MyReducer>  ());
> }
>
>
> I would like to share it with the rest of the hadoop community.
> I hope this list is the right place to ask where  would be the best place to make it
available for rest of the world, or maybe make it part of hadoop.
>
> Regards,
>
>   Erez Katz

Mime
View raw message