incubator-crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Viewing intermediate states for debugging
Date Tue, 29 Jan 2013 02:43:06 GMT
I usually run them in Eclipse, but not using a particularly special run
configuration (I think.) Let me see if I can replicate that one-- which CDH
version?


On Mon, Jan 28, 2013 at 3:13 PM, Micah Whitacre <mkwhitacre@gmail.com>wrote:

> Related to this thread, where I asked how to save off the intermediate
> state but in general how do you debug the project, specifically for
> the IT tests?  Do you typically run through Eclipse with special
> profiles?
>
> I'm still trying to track down an odd failure in crunch-hbase when
> swapping out the dependencies to use CDH4.1.x.  The test failure seems
> to indicate the test is joining the same PCollection on itself.
>
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 63.13
> sec <<< FAILURE!
> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
> elapsed: 62.789 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]>
> but was:<[bird,bird, zebra,zebra, horse,horse, donkey,donkey]>
>         at org.junit.Assert.fail(Assert.java:93)
>         at org.junit.Assert.failNotEquals(Assert.java:647)
>         at org.junit.Assert.assertEquals(Assert.java:128)
>         at org.junit.Assert.assertEquals(Assert.java:147)
>         at
> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:257)
>         at
> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
>
> and sometimes:
>
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 71.958
> sec <<< FAILURE!
> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
> elapsed: 71.469 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]>
> but was:<[dog,dog, cat,cat]>
>         at org.junit.Assert.fail(Assert.java:93)
>         at org.junit.Assert.failNotEquals(Assert.java:647)
>         at org.junit.Assert.assertEquals(Assert.java:128)
>         at org.junit.Assert.assertEquals(Assert.java:147)
>         at
> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:259)
>         at
> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
>
> Most likely due to the same reason Crunch requires a special build of
> HBase 0.94.1, I've found I need to mix and match CDH4 versions as
> shown by the attached patch.  For the Crunch core build I need to use
> all of the latest 2.0.0 code but for testing crunch-hbase I need to
> use the mrv1 fork for hadoop-core and hadoop-minicluster.  I wouldn't
> think that either of those would affect the tests unless somehow the
> files used for the intermediate states were not being temporarily
> stored correctly.  The fact that the test fails differently does make
> me wonder about a concurrency issue but I'm not sure where.
>
> Any pointers on debugging would be helpful.
> Micah
>
> On Thu, Jan 24, 2013 at 2:24 PM, Micah Whitacre <mkwhitacre@gmail.com>
> wrote:
> > I am creating an entirely new profile simply to keep my changes
> > separate from what is in apache/master.
> >
> > Thanks for the hint about the "naive" approach.  Previously I had the
> following:
> >
> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
> >
> <hadoop.client.version>2.0.0-mr1-cdh4.1.1</hadoop.client.version>
> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
> >
> > If I follow what you did and change it to:
> >
> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
> >             <hadoop.client.version>2.0.0-cdh4.1.1</hadoop.client.version>
> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
> >
> > The build gets farther.  I now have a different failure in
> > crunch-hbase I'll start working on.
> >
> > Thanks for your help.
> > Micah
> >
> >
> > On Thu, Jan 24, 2013 at 12:23 PM, Josh Wills <jwills@cloudera.com>
> wrote:
> >> Micah,
> >>
> >> I did the naive thing and just swapped in 2.0.0-cdh4.1.2 for
> 2.0.0-alpha in
> >> the crunch.platform=2 profile in the top level POM and then added in the
> >> Cloudera repositories. That works for me-- does it work for you? It
> sounds
> >> to me like you're creating an entirely new profile.
> >>
> >> J
> >>
> >>
> >> On Thu, Jan 24, 2013 at 7:58 AM, Micah Whitacre <mkwhitacre@gmail.com>
> >> wrote:
> >>>
> >>> running dependency:tree on both projects shows that the version of
> >>> Avro is 1.7.0 for running under both profiles.  I wish it was that
> >>> easy.  :)
> >>>
> >>> On Thu, Jan 24, 2013 at 9:53 AM, Josh Wills <jwills@cloudera.com>
> wrote:
> >>> >
> >>> >
> >>> >
> >>> > On Thu, Jan 24, 2013 at 6:40 AM, Micah Whitacre <
> mkwhitacre@gmail.com>
> >>> > wrote:
> >>> >>
> >>> >> Taking a step back and comparing what is being generated for a
> normal
> >>> >> successful test run of "-Dcrunch.platform=2" I do see a p1 and
p2
> >>> >> directory being created, with the expected materialized output
being
> >>> >> in the p1 directory.  So I'm still curious about tracking all of
the
> >>> >> intermediate state but it doesn't look like it is an issue with
> regard
> >>> >> to creating the output in the wrong directory.
> >>> >
> >>> >
> >>> > That's a relief. :)
> >>> >
> >>> > I think the issue with temp outputs has to do with our use of the
> >>> > TemporaryPath libraries for creating, well, temporary paths. We do
> this
> >>> > so
> >>> > we play nicely with CI frameworks, but you might need to disable it
> for
> >>> > investigating intermediate outputs.
> >>> >
> >>> > Re: the specific error you're seeing, that looks interesting. I
> wonder
> >>> > if
> >>> > it's an Avro version change or some such thing. Will see if I can
> >>> > replicate
> >>> > it.
> >>> >
> >>> >
> >>> > --
> >>> > Director of Data Science
> >>> > Cloudera
> >>> > Twitter: @josh_wills
> >>
> >>
> >>
> >>
> >> --
> >> Director of Data Science
> >> Cloudera
> >> Twitter: @josh_wills
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message