hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erez Katz <erez_k...@yahoo.com>
Subject running hadoop pipes locally (debugging / unit testing / gdb / c++)
Date Thu, 12 Nov 2009 23:24:25 GMT
Greetings,

I have written a simple yet pretty handy framework for debugging hadoop pipes programs locally.
It is called GaDooB  ... combination of GDB and Hadoop :) .

It helps debugging/unit testing c++ hadoop map-reduce programs that were built using hadoop
pipes.

It is basically a sequencer that reads input text files and feeds them to a mapper, collects
the output and feeds it to a reducer. It also handles the usage of a combiner, partitioner
and multiple reducers.

All the code is in header files. There are no libraries to link with and there is no change
to the build process (besides maybe an extra include path).

I kept the dependencies to a bare minimum, by only using basic stl collections (map/vector/string)
and basic io, which are used in hadoop pipes anyway.


For example, let say this is the main function of a pipes map reduce program:

int main (int argc, char* argv[])
{
return HadoopPipes::runTask(   
     HadoopPipes::TemplateFactory< MyMapper,MyReducer  > ());
}


Then the locally-run version would look like:

int main (int argc, char* argv[])
{
    if ((argc>=2) && (strcmp(argv[1],"debugMeLocally")==0))
    {
        std::map<std::string,std::string> confMap;        
        SimpleConfReader().readConf("./my_jobconf.xml" , confMap);
        
        confMap["extraParam"]="extraValue";
        
        string inputFile  = "/tmp/mpj.txt";
        string outputFile = "/tmp/out1.txt";

        GaDooBSequencer::runTaskLocally<MyMapper,MyReducer>(confMap,inputFile ,
outputFile);
        
        return 0;
    }

   return HadoopPipes::runTask(   
     HadoopPipes::TemplateFactory< MyMapper,MyReducer  > ());
}


I would like to share it with the rest of the hadoop community.
I hope this list is the right place to ask where  would be the best place to make it available
for rest of the world, or maybe make it part of hadoop.

Regards,

 Erez Katz




      

Mime
View raw message