crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <mkwhita...@gmail.com>
Subject Re: Viewing intermediate states for debugging
Date Mon, 28 Jan 2013 23:13:01 GMT
Related to this thread, where I asked how to save off the intermediate
state but in general how do you debug the project, specifically for
the IT tests?  Do you typically run through Eclipse with special
profiles?

I'm still trying to track down an odd failure in crunch-hbase when
swapping out the dependencies to use CDH4.1.x.  The test failure seems
to indicate the test is joining the same PCollection on itself.

Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 63.13
sec <<< FAILURE!
testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
elapsed: 62.789 sec  <<< FAILURE!
java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]>
but was:<[bird,bird, zebra,zebra, horse,horse, donkey,donkey]>
	at org.junit.Assert.fail(Assert.java:93)
	at org.junit.Assert.failNotEquals(Assert.java:647)
	at org.junit.Assert.assertEquals(Assert.java:128)
	at org.junit.Assert.assertEquals(Assert.java:147)
	at org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:257)
	at org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)

and sometimes:

Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 71.958
sec <<< FAILURE!
testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
elapsed: 71.469 sec  <<< FAILURE!
java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]>
but was:<[dog,dog, cat,cat]>
	at org.junit.Assert.fail(Assert.java:93)
	at org.junit.Assert.failNotEquals(Assert.java:647)
	at org.junit.Assert.assertEquals(Assert.java:128)
	at org.junit.Assert.assertEquals(Assert.java:147)
	at org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:259)
	at org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)

Most likely due to the same reason Crunch requires a special build of
HBase 0.94.1, I've found I need to mix and match CDH4 versions as
shown by the attached patch.  For the Crunch core build I need to use
all of the latest 2.0.0 code but for testing crunch-hbase I need to
use the mrv1 fork for hadoop-core and hadoop-minicluster.  I wouldn't
think that either of those would affect the tests unless somehow the
files used for the intermediate states were not being temporarily
stored correctly.  The fact that the test fails differently does make
me wonder about a concurrency issue but I'm not sure where.

Any pointers on debugging would be helpful.
Micah

On Thu, Jan 24, 2013 at 2:24 PM, Micah Whitacre <mkwhitacre@gmail.com> wrote:
> I am creating an entirely new profile simply to keep my changes
> separate from what is in apache/master.
>
> Thanks for the hint about the "naive" approach.  Previously I had the following:
>
>             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>             <hadoop.client.version>2.0.0-mr1-cdh4.1.1</hadoop.client.version>
>             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>
> If I follow what you did and change it to:
>
>             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>             <hadoop.client.version>2.0.0-cdh4.1.1</hadoop.client.version>
>             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>
> The build gets farther.  I now have a different failure in
> crunch-hbase I'll start working on.
>
> Thanks for your help.
> Micah
>
>
> On Thu, Jan 24, 2013 at 12:23 PM, Josh Wills <jwills@cloudera.com> wrote:
>> Micah,
>>
>> I did the naive thing and just swapped in 2.0.0-cdh4.1.2 for 2.0.0-alpha in
>> the crunch.platform=2 profile in the top level POM and then added in the
>> Cloudera repositories. That works for me-- does it work for you? It sounds
>> to me like you're creating an entirely new profile.
>>
>> J
>>
>>
>> On Thu, Jan 24, 2013 at 7:58 AM, Micah Whitacre <mkwhitacre@gmail.com>
>> wrote:
>>>
>>> running dependency:tree on both projects shows that the version of
>>> Avro is 1.7.0 for running under both profiles.  I wish it was that
>>> easy.  :)
>>>
>>> On Thu, Jan 24, 2013 at 9:53 AM, Josh Wills <jwills@cloudera.com> wrote:
>>> >
>>> >
>>> >
>>> > On Thu, Jan 24, 2013 at 6:40 AM, Micah Whitacre <mkwhitacre@gmail.com>
>>> > wrote:
>>> >>
>>> >> Taking a step back and comparing what is being generated for a normal
>>> >> successful test run of "-Dcrunch.platform=2" I do see a p1 and p2
>>> >> directory being created, with the expected materialized output being
>>> >> in the p1 directory.  So I'm still curious about tracking all of the
>>> >> intermediate state but it doesn't look like it is an issue with regard
>>> >> to creating the output in the wrong directory.
>>> >
>>> >
>>> > That's a relief. :)
>>> >
>>> > I think the issue with temp outputs has to do with our use of the
>>> > TemporaryPath libraries for creating, well, temporary paths. We do this
>>> > so
>>> > we play nicely with CI frameworks, but you might need to disable it for
>>> > investigating intermediate outputs.
>>> >
>>> > Re: the specific error you're seeing, that looks interesting. I wonder
>>> > if
>>> > it's an Avro version change or some such thing. Will see if I can
>>> > replicate
>>> > it.
>>> >
>>> >
>>> > --
>>> > Director of Data Science
>>> > Cloudera
>>> > Twitter: @josh_wills
>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera
>> Twitter: @josh_wills

Mime
View raw message