hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Boudnik <...@apache.org>
Subject Re: Hadoop testing project
Date Thu, 17 Feb 2011 19:21:02 GMT

I am sure that packaging of Hadoop and other application working
directly with Hadoop is a highly needed thing (although there's always
a tricky question how many platforms you plan to provide packaging
for, etc.). What we are discussing here is software testing, not
packaging nor integration issues between packaged bits.

If you want to - please start a separate discussion to avoid steering
this thread away and not mixing the issues.


> I think the bigger concern is that Hadoop ecosystem does not have a standard
> method in linking dependencies.  Hbase depends on Zookeeper, and Pig depends
> on Hadoop and Hbase.  Then pig decided to put hadoop-core jar in it's own
> jar file.  Chukwa depends on pig + hbase + hadoop and zookeeper.  The
> version incompatibility is probably what driving people nuts.  Hence, there
> is a new proposal on how to integrate among hadoop ecosystem.  I urge
> project owners to review the proposal and provide feedbacks.
> The proposal is located at:
> https://issues.apache.org/jira/secure/attachment/12470823/deployment.pdf
> The related jiras are:
> https://issues.apache.org/jira/browse/HADOOP-6255
> https://issues.apache.org/jira/browse/PIG-1857
> There are plans to file more jiras for related projects.  The integration
> would also be a lot easier if all related projects are using maven for
> dependency management.
> Regards,
> Eric
> On 2/17/11 9:33 AM, "Konstantin Boudnik" <cos@apache.org> wrote:
>> On Thu, Feb 17, 2011 at 05:45, Ian Holsman <hadoop@holsman.net> wrote:
>>> I'm not sure it makes sense to  all the testing packages under a different
>>> umbrella that covers the code they test.
>>> While there might be commonalities building a test harness, I would think
>>> that each testing tool would need to have deep knowledge of the tool's
>>> internals that it is testing. as such it would need someone with the
>>> experience to code it.
>> That's pretty much true indeed if you are talking about tests for a
>> project or closely tightened projects such as Herriot in Hadoop.
>> Speaking of tools there are some benefits though. Say, PigUnit and
>> MRUnit are both xUnit frameworks. The former allows you to run Pig
>> jobs in local and cluster mode. The latter is to validate MB jobs
>> without a need to fire up a cluster.
>>> I don't see what advantage combining PigUnit & say 'MRUnit' would be for
>>> example.
>> Don't you think Pig user would benefit if Pig scripts can be tested
>> against MRUnit which gives you a flavor of cluster environment without
>> one? Now, do you think it is likely that someone will go great lengths
>> to make such an effort and build such a bridge right now?
>> Cos
>>> --I
>>> On Feb 16, 2011, at 2:50 PM, Konstantin Boudnik wrote:
>>>> Steve.
>>>> If the project under discussion will provide a common harness where such
>>>> test
>>>> artifact (think of a Maven artifact for example) will click and will be
>>>> executed automatically with all needed tools and dependencies resolved for
>>>> you
>>>> - would it be appealing for end-users' cause?
>>>> As Joep said this "...will reduce the effort to take any (set of ) changes
>>>> from development into production." Take it one step further: when your
>>>> cluster
>>>> is 'assembled' you need to validate it (on top of a concrete OS, etc.); is
>>>> it
>>>> desirable to follow N-steps process to bring about whatever testing
>>>> work-load
>>>> you need or you'd prefer to simply do something like:
>>>>    wget http://workloads.internal.mydomain.com/stackValidations/v12.4.pom
>>>>        && mvn verify
>>>> and check the results later on?
>>>> These gonna be the same tools that dev. use for their tasks although
>>>> worksets
>>>> will be different. So what?
>>>> Cos
>>>> On Wed, Feb 16, 2011 at 11:37AM, Steve Loughran wrote:
>>>>> On 15/02/11 21:58, Konstantin Boudnik wrote:
>>>>>> While MrUnit discussion draws to its natural conclusion I would like
>>>>>> to bring up another point which might be well aligned with that
>>>>>> discussion. Patrick Hunt has brought up this idea earlier today and
>>>>>> believe it has to be elaborated further.
>>>>>> A number of testing projects both for Hadoop and Hadoop-related
>>>>>> component were brought to life over last year or two. Among those
>>>>>> MRUnit, PigUnit, YCSB, Herriot, and perhaps a few more. They all
>>>>>> focusing on more or less the same problem e.g. validation of Hadoop
>>>>>> on-top-of-Hadoop components, or application level testing for Hadoop.
>>>>>> However, the fact that they all are spread across a wide variety
>>>>>> projects seems to confuse/mislead Hadoop users.
>>>>>> How about incubating a bigger Hadoop (Pig, Oozie, HBase) testing
>>>>>> project which will take care about development and support of common
>>>>>> (where's possible) tools, frameworks and the like? Please feel free
>>>>>> share your thoughts :)
>>>>>> --
>>>>> I think it would be good though specific projects will need/have their
>>>>> own testing needs -I'd expect more focus for testing redistributables
>>>>> be on helping Hadoop users test their stuff against subsets of data,
>>>>> rather than the hadoop-*-dev problem of "stressing the hadoop stack once
>>>>> your latest patch is applied".
>>>>> That said, the whole problem of qualifying an OS, Java release and
>>>>> cluster is something we'd expect most end user teams to have to do
>>>>> -right now terasort is the main stress test.

View raw message