incubator-crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <mkwhita...@gmail.com>
Subject Re: Viewing intermediate states for debugging
Date Tue, 29 Jan 2013 17:12:56 GMT
Unfortunately it doesn't look like this is just a test failure as
running against a CDH4.1.1 cluster fails in the exact same manner.
Here is a copy of the code I used[1]

[1] - http://pastebin.com/QLEc5fmG

On Tue, Jan 29, 2013 at 8:44 AM, Micah Whitacre <mkwhitacre@gmail.com> wrote:
> The problem of reading from the same table twice seems interesting.
> At one point when trying to figure out the problem I tweaked the test
> to run the joinedTable through the same wordCount steps to make sure
> everything was read and then persisted correctly.  So the flow of the
> test became:
>
> write to wordcount table
> wordcount
> write to join table
> wordcount the join table (output to a different table)
> attempt to join words with others.
>
> That flow would work as expected but still fail on the last join.  So
> it seems like it would be reading in correctly from HBase.
>
> I am working on building a stand alone example and will report back
> the findings.
>
> thanks for your help,
> micah
>
>
> On Mon, Jan 28, 2013 at 11:55 PM, Josh Wills <jwills@cloudera.com> wrote:
>> I have to call it a night, but this is an odd one.
>>
>> The basic problem seems to be that we are reading from the same table
>> twice-- it seems like the HTable object is the same on both splits (always
>> reading from the words table, or always reading from the joinTableName
>> table), but the Scan object appears to get updated. I verified this by using
>> a different column family on the joinTableName table and seeing that the
>> test returned no output for the join, which is what we would expect if one
>> of the reads had no input.
>>
>> Looking in the code, I don't see a place where the 0.92.1 and 0.90.4 code
>> differ significantly in terms of the input format, record reader, etc. I'm
>> on the road this week, but I'd like to work on this one some more when I'm
>> back in SF and can sit down with my co-workers who know more HBase than I
>> do.
>>
>> Out of curiousity-- is it just the unit test that fails, or can you run a
>> real HBase MR job that suffers from this problem?
>>
>> J
>>
>>
>> On Mon, Jan 28, 2013 at 7:26 PM, Josh Wills <jwills@cloudera.com> wrote:
>>>
>>> Ack, sorry-- was checking email on my phone and didn't see the patch. I
>>> can replicate it locally, digging in now.
>>>
>>>
>>> On Mon, Jan 28, 2013 at 6:47 PM, Whitacre,Micah
>>> <MICAH.WHITACRE@cerner.com> wrote:
>>>>
>>>> The patch should contain the specifics but I've tested using 4.1.1,
>>>> 4.1.2, and 4.1.3. Each gives the same results.
>>>>
>>>>
>>>>
>>>>
>>>> On Jan 28, 2013, at 20:44, "Josh Wills" <jwills@cloudera.com> wrote:
>>>>
>>>> I usually run them in Eclipse, but not using a particularly special run
>>>> configuration (I think.) Let me see if I can replicate that one-- which CDH
>>>> version?
>>>>
>>>>
>>>> On Mon, Jan 28, 2013 at 3:13 PM, Micah Whitacre <mkwhitacre@gmail.com>
>>>> wrote:
>>>>>
>>>>> Related to this thread, where I asked how to save off the intermediate
>>>>> state but in general how do you debug the project, specifically for
>>>>> the IT tests?  Do you typically run through Eclipse with special
>>>>> profiles?
>>>>>
>>>>> I'm still trying to track down an odd failure in crunch-hbase when
>>>>> swapping out the dependencies to use CDH4.1.x.  The test failure seems
>>>>> to indicate the test is joining the same PCollection on itself.
>>>>>
>>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 63.13
>>>>> sec <<< FAILURE!
>>>>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
>>>>> elapsed: 62.789 sec  <<< FAILURE!
>>>>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]>
>>>>> but was:<[bird,bird, zebra,zebra, horse,horse, donkey,donkey]>
>>>>>         at org.junit.Assert.fail(Assert.java:93)
>>>>>         at org.junit.Assert.failNotEquals(Assert.java:647)
>>>>>         at org.junit.Assert.assertEquals(Assert.java:128)
>>>>>         at org.junit.Assert.assertEquals(Assert.java:147)
>>>>>         at
>>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:257)
>>>>>         at
>>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
>>>>>
>>>>> and sometimes:
>>>>>
>>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 71.958
>>>>> sec <<< FAILURE!
>>>>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
>>>>> elapsed: 71.469 sec  <<< FAILURE!
>>>>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]>
>>>>> but was:<[dog,dog, cat,cat]>
>>>>>         at org.junit.Assert.fail(Assert.java:93)
>>>>>         at org.junit.Assert.failNotEquals(Assert.java:647)
>>>>>         at org.junit.Assert.assertEquals(Assert.java:128)
>>>>>         at org.junit.Assert.assertEquals(Assert.java:147)
>>>>>         at
>>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:259)
>>>>>         at
>>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
>>>>>
>>>>> Most likely due to the same reason Crunch requires a special build of
>>>>> HBase 0.94.1, I've found I need to mix and match CDH4 versions as
>>>>> shown by the attached patch.  For the Crunch core build I need to use
>>>>> all of the latest 2.0.0 code but for testing crunch-hbase I need to
>>>>> use the mrv1 fork for hadoop-core and hadoop-minicluster.  I wouldn't
>>>>> think that either of those would affect the tests unless somehow the
>>>>> files used for the intermediate states were not being temporarily
>>>>> stored correctly.  The fact that the test fails differently does make
>>>>> me wonder about a concurrency issue but I'm not sure where.
>>>>>
>>>>> Any pointers on debugging would be helpful.
>>>>> Micah
>>>>>
>>>>> On Thu, Jan 24, 2013 at 2:24 PM, Micah Whitacre <mkwhitacre@gmail.com>
>>>>> wrote:
>>>>> > I am creating an entirely new profile simply to keep my changes
>>>>> > separate from what is in apache/master.
>>>>> >
>>>>> > Thanks for the hint about the "naive" approach.  Previously I had
the
>>>>> > following:
>>>>> >
>>>>> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>>>>> >
>>>>> > <hadoop.client.version>2.0.0-mr1-cdh4.1.1</hadoop.client.version>
>>>>> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>>>>> >
>>>>> > If I follow what you did and change it to:
>>>>> >
>>>>> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>>>>> >
>>>>> > <hadoop.client.version>2.0.0-cdh4.1.1</hadoop.client.version>
>>>>> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>>>>> >
>>>>> > The build gets farther.  I now have a different failure in
>>>>> > crunch-hbase I'll start working on.
>>>>> >
>>>>> > Thanks for your help.
>>>>> > Micah
>>>>> >
>>>>> >
>>>>> > On Thu, Jan 24, 2013 at 12:23 PM, Josh Wills <jwills@cloudera.com>
>>>>> > wrote:
>>>>> >> Micah,
>>>>> >>
>>>>> >> I did the naive thing and just swapped in 2.0.0-cdh4.1.2 for
>>>>> >> 2.0.0-alpha in
>>>>> >> the crunch.platform=2 profile in the top level POM and then
added in
>>>>> >> the
>>>>> >> Cloudera repositories. That works for me-- does it work for
you? It
>>>>> >> sounds
>>>>> >> to me like you're creating an entirely new profile.
>>>>> >>
>>>>> >> J
>>>>> >>
>>>>> >>
>>>>> >> On Thu, Jan 24, 2013 at 7:58 AM, Micah Whitacre
>>>>> >> <mkwhitacre@gmail.com>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> running dependency:tree on both projects shows that the
version of
>>>>> >>> Avro is 1.7.0 for running under both profiles.  I wish it
was that
>>>>> >>> easy.  :)
>>>>> >>>
>>>>> >>> On Thu, Jan 24, 2013 at 9:53 AM, Josh Wills <jwills@cloudera.com>
>>>>> >>> wrote:
>>>>> >>> >
>>>>> >>> >
>>>>> >>> >
>>>>> >>> > On Thu, Jan 24, 2013 at 6:40 AM, Micah Whitacre
>>>>> >>> > <mkwhitacre@gmail.com>
>>>>> >>> > wrote:
>>>>> >>> >>
>>>>> >>> >> Taking a step back and comparing what is being
generated for a
>>>>> >>> >> normal
>>>>> >>> >> successful test run of "-Dcrunch.platform=2" I
do see a p1 and p2
>>>>> >>> >> directory being created, with the expected materialized
output
>>>>> >>> >> being
>>>>> >>> >> in the p1 directory.  So I'm still curious about
tracking all of
>>>>> >>> >> the
>>>>> >>> >> intermediate state but it doesn't look like it
is an issue with
>>>>> >>> >> regard
>>>>> >>> >> to creating the output in the wrong directory.
>>>>> >>> >
>>>>> >>> >
>>>>> >>> > That's a relief. :)
>>>>> >>> >
>>>>> >>> > I think the issue with temp outputs has to do with
our use of the
>>>>> >>> > TemporaryPath libraries for creating, well, temporary
paths. We do
>>>>> >>> > this
>>>>> >>> > so
>>>>> >>> > we play nicely with CI frameworks, but you might need
to disable
>>>>> >>> > it for
>>>>> >>> > investigating intermediate outputs.
>>>>> >>> >
>>>>> >>> > Re: the specific error you're seeing, that looks interesting.
I
>>>>> >>> > wonder
>>>>> >>> > if
>>>>> >>> > it's an Avro version change or some such thing. Will
see if I can
>>>>> >>> > replicate
>>>>> >>> > it.
>>>>> >>> >
>>>>> >>> >
>>>>> >>> > --
>>>>> >>> > Director of Data Science
>>>>> >>> > Cloudera
>>>>> >>> > Twitter: @josh_wills
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Director of Data Science
>>>>> >> Cloudera
>>>>> >> Twitter: @josh_wills
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Director of Data Science
>>>> Cloudera
>>>> Twitter: @josh_wills
>>>>
>>>> CONFIDENTIALITY NOTICE This message and any included attachments are from
>>>> Cerner Corporation and are intended only for the addressee. The information
>>>> contained in this message is confidential and may constitute inside or
>>>> non-public information under international, federal, or state securities
>>>> laws. Unauthorized forwarding, printing, copying, distribution, or use of
>>>> such information is strictly prohibited and may be unlawful. If you are not
>>>> the addressee, please promptly delete this message and notify the sender
of
>>>> the delivery error by e-mail or you may call Cerner's corporate offices in
>>>> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>>>
>>>
>>>
>>>
>>> --
>>> Director of Data Science
>>> Cloudera
>>> Twitter: @josh_wills
>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera
>> Twitter: @josh_wills

Mime
View raw message