incubator-crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <mkwhita...@gmail.com>
Subject Re: Viewing intermediate states for debugging
Date Tue, 29 Jan 2013 14:44:15 GMT
The problem of reading from the same table twice seems interesting.
At one point when trying to figure out the problem I tweaked the test
to run the joinedTable through the same wordCount steps to make sure
everything was read and then persisted correctly.  So the flow of the
test became:

write to wordcount table
wordcount
write to join table
wordcount the join table (output to a different table)
attempt to join words with others.

That flow would work as expected but still fail on the last join.  So
it seems like it would be reading in correctly from HBase.

I am working on building a stand alone example and will report back
the findings.

thanks for your help,
micah


On Mon, Jan 28, 2013 at 11:55 PM, Josh Wills <jwills@cloudera.com> wrote:
> I have to call it a night, but this is an odd one.
>
> The basic problem seems to be that we are reading from the same table
> twice-- it seems like the HTable object is the same on both splits (always
> reading from the words table, or always reading from the joinTableName
> table), but the Scan object appears to get updated. I verified this by using
> a different column family on the joinTableName table and seeing that the
> test returned no output for the join, which is what we would expect if one
> of the reads had no input.
>
> Looking in the code, I don't see a place where the 0.92.1 and 0.90.4 code
> differ significantly in terms of the input format, record reader, etc. I'm
> on the road this week, but I'd like to work on this one some more when I'm
> back in SF and can sit down with my co-workers who know more HBase than I
> do.
>
> Out of curiousity-- is it just the unit test that fails, or can you run a
> real HBase MR job that suffers from this problem?
>
> J
>
>
> On Mon, Jan 28, 2013 at 7:26 PM, Josh Wills <jwills@cloudera.com> wrote:
>>
>> Ack, sorry-- was checking email on my phone and didn't see the patch. I
>> can replicate it locally, digging in now.
>>
>>
>> On Mon, Jan 28, 2013 at 6:47 PM, Whitacre,Micah
>> <MICAH.WHITACRE@cerner.com> wrote:
>>>
>>> The patch should contain the specifics but I've tested using 4.1.1,
>>> 4.1.2, and 4.1.3. Each gives the same results.
>>>
>>>
>>>
>>>
>>> On Jan 28, 2013, at 20:44, "Josh Wills" <jwills@cloudera.com> wrote:
>>>
>>> I usually run them in Eclipse, but not using a particularly special run
>>> configuration (I think.) Let me see if I can replicate that one-- which CDH
>>> version?
>>>
>>>
>>> On Mon, Jan 28, 2013 at 3:13 PM, Micah Whitacre <mkwhitacre@gmail.com>
>>> wrote:
>>>>
>>>> Related to this thread, where I asked how to save off the intermediate
>>>> state but in general how do you debug the project, specifically for
>>>> the IT tests?  Do you typically run through Eclipse with special
>>>> profiles?
>>>>
>>>> I'm still trying to track down an odd failure in crunch-hbase when
>>>> swapping out the dependencies to use CDH4.1.x.  The test failure seems
>>>> to indicate the test is joining the same PCollection on itself.
>>>>
>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 63.13
>>>> sec <<< FAILURE!
>>>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
>>>> elapsed: 62.789 sec  <<< FAILURE!
>>>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]>
>>>> but was:<[bird,bird, zebra,zebra, horse,horse, donkey,donkey]>
>>>>         at org.junit.Assert.fail(Assert.java:93)
>>>>         at org.junit.Assert.failNotEquals(Assert.java:647)
>>>>         at org.junit.Assert.assertEquals(Assert.java:128)
>>>>         at org.junit.Assert.assertEquals(Assert.java:147)
>>>>         at
>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:257)
>>>>         at
>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
>>>>
>>>> and sometimes:
>>>>
>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 71.958
>>>> sec <<< FAILURE!
>>>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
>>>> elapsed: 71.469 sec  <<< FAILURE!
>>>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]>
>>>> but was:<[dog,dog, cat,cat]>
>>>>         at org.junit.Assert.fail(Assert.java:93)
>>>>         at org.junit.Assert.failNotEquals(Assert.java:647)
>>>>         at org.junit.Assert.assertEquals(Assert.java:128)
>>>>         at org.junit.Assert.assertEquals(Assert.java:147)
>>>>         at
>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:259)
>>>>         at
>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
>>>>
>>>> Most likely due to the same reason Crunch requires a special build of
>>>> HBase 0.94.1, I've found I need to mix and match CDH4 versions as
>>>> shown by the attached patch.  For the Crunch core build I need to use
>>>> all of the latest 2.0.0 code but for testing crunch-hbase I need to
>>>> use the mrv1 fork for hadoop-core and hadoop-minicluster.  I wouldn't
>>>> think that either of those would affect the tests unless somehow the
>>>> files used for the intermediate states were not being temporarily
>>>> stored correctly.  The fact that the test fails differently does make
>>>> me wonder about a concurrency issue but I'm not sure where.
>>>>
>>>> Any pointers on debugging would be helpful.
>>>> Micah
>>>>
>>>> On Thu, Jan 24, 2013 at 2:24 PM, Micah Whitacre <mkwhitacre@gmail.com>
>>>> wrote:
>>>> > I am creating an entirely new profile simply to keep my changes
>>>> > separate from what is in apache/master.
>>>> >
>>>> > Thanks for the hint about the "naive" approach.  Previously I had the
>>>> > following:
>>>> >
>>>> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>>>> >
>>>> > <hadoop.client.version>2.0.0-mr1-cdh4.1.1</hadoop.client.version>
>>>> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>>>> >
>>>> > If I follow what you did and change it to:
>>>> >
>>>> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>>>> >
>>>> > <hadoop.client.version>2.0.0-cdh4.1.1</hadoop.client.version>
>>>> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>>>> >
>>>> > The build gets farther.  I now have a different failure in
>>>> > crunch-hbase I'll start working on.
>>>> >
>>>> > Thanks for your help.
>>>> > Micah
>>>> >
>>>> >
>>>> > On Thu, Jan 24, 2013 at 12:23 PM, Josh Wills <jwills@cloudera.com>
>>>> > wrote:
>>>> >> Micah,
>>>> >>
>>>> >> I did the naive thing and just swapped in 2.0.0-cdh4.1.2 for
>>>> >> 2.0.0-alpha in
>>>> >> the crunch.platform=2 profile in the top level POM and then added
in
>>>> >> the
>>>> >> Cloudera repositories. That works for me-- does it work for you?
It
>>>> >> sounds
>>>> >> to me like you're creating an entirely new profile.
>>>> >>
>>>> >> J
>>>> >>
>>>> >>
>>>> >> On Thu, Jan 24, 2013 at 7:58 AM, Micah Whitacre
>>>> >> <mkwhitacre@gmail.com>
>>>> >> wrote:
>>>> >>>
>>>> >>> running dependency:tree on both projects shows that the version
of
>>>> >>> Avro is 1.7.0 for running under both profiles.  I wish it was
that
>>>> >>> easy.  :)
>>>> >>>
>>>> >>> On Thu, Jan 24, 2013 at 9:53 AM, Josh Wills <jwills@cloudera.com>
>>>> >>> wrote:
>>>> >>> >
>>>> >>> >
>>>> >>> >
>>>> >>> > On Thu, Jan 24, 2013 at 6:40 AM, Micah Whitacre
>>>> >>> > <mkwhitacre@gmail.com>
>>>> >>> > wrote:
>>>> >>> >>
>>>> >>> >> Taking a step back and comparing what is being generated
for a
>>>> >>> >> normal
>>>> >>> >> successful test run of "-Dcrunch.platform=2" I do see
a p1 and p2
>>>> >>> >> directory being created, with the expected materialized
output
>>>> >>> >> being
>>>> >>> >> in the p1 directory.  So I'm still curious about tracking
all of
>>>> >>> >> the
>>>> >>> >> intermediate state but it doesn't look like it is an
issue with
>>>> >>> >> regard
>>>> >>> >> to creating the output in the wrong directory.
>>>> >>> >
>>>> >>> >
>>>> >>> > That's a relief. :)
>>>> >>> >
>>>> >>> > I think the issue with temp outputs has to do with our
use of the
>>>> >>> > TemporaryPath libraries for creating, well, temporary paths.
We do
>>>> >>> > this
>>>> >>> > so
>>>> >>> > we play nicely with CI frameworks, but you might need to
disable
>>>> >>> > it for
>>>> >>> > investigating intermediate outputs.
>>>> >>> >
>>>> >>> > Re: the specific error you're seeing, that looks interesting.
I
>>>> >>> > wonder
>>>> >>> > if
>>>> >>> > it's an Avro version change or some such thing. Will see
if I can
>>>> >>> > replicate
>>>> >>> > it.
>>>> >>> >
>>>> >>> >
>>>> >>> > --
>>>> >>> > Director of Data Science
>>>> >>> > Cloudera
>>>> >>> > Twitter: @josh_wills
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Director of Data Science
>>>> >> Cloudera
>>>> >> Twitter: @josh_wills
>>>
>>>
>>>
>>>
>>> --
>>> Director of Data Science
>>> Cloudera
>>> Twitter: @josh_wills
>>>
>>> CONFIDENTIALITY NOTICE This message and any included attachments are from
>>> Cerner Corporation and are intended only for the addressee. The information
>>> contained in this message is confidential and may constitute inside or
>>> non-public information under international, federal, or state securities
>>> laws. Unauthorized forwarding, printing, copying, distribution, or use of
>>> such information is strictly prohibited and may be unlawful. If you are not
>>> the addressee, please promptly delete this message and notify the sender of
>>> the delivery error by e-mail or you may call Cerner's corporate offices in
>>> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera
>> Twitter: @josh_wills
>
>
>
>
> --
> Director of Data Science
> Cloudera
> Twitter: @josh_wills

Mime
View raw message