hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris K Wensel <ch...@wensel.net>
Subject Re: Developing, Testing, Distributing
Date Thu, 07 Apr 2011 15:33:30 GMT
> How do you test your code, which Unit test libraries your using, how do you run your automatic
tests after you have finished the development?
> Do you have test/qa/staging environments beside the dev and the production? How do you
keep it similar to the production
> Code reuse - how do you build components that can be used in other jobs, do you build
generic map or reduce class?

In all honesty you should take a look at Cascading. It was designed to simplify this, but
keep in mind i'm the project lead so biased.

In Cascading, there are three distinct elements that can be tested independently.

- operations, things like functions and filters. that can typically be re-used in any cascading
- assemblies of operations that constitute a unit of work or some algorithmic process (this
will become 1 or more MR jobs during runtime)
- taps, the things that talk to HDFS or external systems like HBase, CouchBase, MySQL, ElasticSearch,

each of these can be unit tested individually or as a whole. and you can make libraries or
frameworks usable by other developers on your teams.

the real value is that you no longer need to think in MapReduce when developing, just the
problem domain. 

and you can test your processing app independently of making it work in staging or production
just by swapping out taps.


btw, I use IntelliJ for all my development. 


Chris K Wensel

-- Concurrent, Inc. offers mentoring, support for Cascading

View raw message