crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Viewing intermediate states for debugging
Date Mon, 04 Feb 2013 01:53:25 GMT
Hey Micah,

This should fix the join issue:
https://issues.apache.org/jira/browse/CRUNCH-160

Let me know if it works for you.

J


On Wed, Jan 30, 2013 at 6:08 AM, Josh Wills <jwills@cloudera.com> wrote:

> Okay, good to know. I'll be back in SF on Friday and will sit down w/some
> of my friends who know HBase better than I do and take another look.
>
> J
>
>
> On Tue, Jan 29, 2013 at 9:12 AM, Micah Whitacre <mkwhitacre@gmail.com>wrote:
>
>> Unfortunately it doesn't look like this is just a test failure as
>> running against a CDH4.1.1 cluster fails in the exact same manner.
>> Here is a copy of the code I used[1]
>>
>> [1] - http://pastebin.com/QLEc5fmG
>>
>> On Tue, Jan 29, 2013 at 8:44 AM, Micah Whitacre <mkwhitacre@gmail.com>
>> wrote:
>> > The problem of reading from the same table twice seems interesting.
>> > At one point when trying to figure out the problem I tweaked the test
>> > to run the joinedTable through the same wordCount steps to make sure
>> > everything was read and then persisted correctly.  So the flow of the
>> > test became:
>> >
>> > write to wordcount table
>> > wordcount
>> > write to join table
>> > wordcount the join table (output to a different table)
>> > attempt to join words with others.
>> >
>> > That flow would work as expected but still fail on the last join.  So
>> > it seems like it would be reading in correctly from HBase.
>> >
>> > I am working on building a stand alone example and will report back
>> > the findings.
>> >
>> > thanks for your help,
>> > micah
>> >
>> >
>> > On Mon, Jan 28, 2013 at 11:55 PM, Josh Wills <jwills@cloudera.com>
>> wrote:
>> >> I have to call it a night, but this is an odd one.
>> >>
>> >> The basic problem seems to be that we are reading from the same table
>> >> twice-- it seems like the HTable object is the same on both splits
>> (always
>> >> reading from the words table, or always reading from the joinTableName
>> >> table), but the Scan object appears to get updated. I verified this by
>> using
>> >> a different column family on the joinTableName table and seeing that
>> the
>> >> test returned no output for the join, which is what we would expect if
>> one
>> >> of the reads had no input.
>> >>
>> >> Looking in the code, I don't see a place where the 0.92.1 and 0.90.4
>> code
>> >> differ significantly in terms of the input format, record reader, etc.
>> I'm
>> >> on the road this week, but I'd like to work on this one some more when
>> I'm
>> >> back in SF and can sit down with my co-workers who know more HBase
>> than I
>> >> do.
>> >>
>> >> Out of curiousity-- is it just the unit test that fails, or can you
>> run a
>> >> real HBase MR job that suffers from this problem?
>> >>
>> >> J
>> >>
>> >>
>> >> On Mon, Jan 28, 2013 at 7:26 PM, Josh Wills <jwills@cloudera.com>
>> wrote:
>> >>>
>> >>> Ack, sorry-- was checking email on my phone and didn't see the patch.
>> I
>> >>> can replicate it locally, digging in now.
>> >>>
>> >>>
>> >>> On Mon, Jan 28, 2013 at 6:47 PM, Whitacre,Micah
>> >>> <MICAH.WHITACRE@cerner.com> wrote:
>> >>>>
>> >>>> The patch should contain the specifics but I've tested using 4.1.1,
>> >>>> 4.1.2, and 4.1.3. Each gives the same results.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Jan 28, 2013, at 20:44, "Josh Wills" <jwills@cloudera.com>
wrote:
>> >>>>
>> >>>> I usually run them in Eclipse, but not using a particularly special
>> run
>> >>>> configuration (I think.) Let me see if I can replicate that one--
>> which CDH
>> >>>> version?
>> >>>>
>> >>>>
>> >>>> On Mon, Jan 28, 2013 at 3:13 PM, Micah Whitacre <
>> mkwhitacre@gmail.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> Related to this thread, where I asked how to save off the
>> intermediate
>> >>>>> state but in general how do you debug the project, specifically
for
>> >>>>> the IT tests?  Do you typically run through Eclipse with special
>> >>>>> profiles?
>> >>>>>
>> >>>>> I'm still trying to track down an odd failure in crunch-hbase
when
>> >>>>> swapping out the dependencies to use CDH4.1.x.  The test failure
>> seems
>> >>>>> to indicate the test is joining the same PCollection on itself.
>> >>>>>
>> >>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
>> 63.13
>> >>>>> sec <<< FAILURE!
>> >>>>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT) 
Time
>> >>>>> elapsed: 62.789 sec  <<< FAILURE!
>> >>>>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey,
>> dog,bird]>
>> >>>>> but was:<[bird,bird, zebra,zebra, horse,horse, donkey,donkey]>
>> >>>>>         at org.junit.Assert.fail(Assert.java:93)
>> >>>>>         at org.junit.Assert.failNotEquals(Assert.java:647)
>> >>>>>         at org.junit.Assert.assertEquals(Assert.java:128)
>> >>>>>         at org.junit.Assert.assertEquals(Assert.java:147)
>> >>>>>         at
>> >>>>>
>> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:257)
>> >>>>>         at
>> >>>>>
>> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
>> >>>>>
>> >>>>> and sometimes:
>> >>>>>
>> >>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
>> 71.958
>> >>>>> sec <<< FAILURE!
>> >>>>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT) 
Time
>> >>>>> elapsed: 71.469 sec  <<< FAILURE!
>> >>>>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey,
>> dog,bird]>
>> >>>>> but was:<[dog,dog, cat,cat]>
>> >>>>>         at org.junit.Assert.fail(Assert.java:93)
>> >>>>>         at org.junit.Assert.failNotEquals(Assert.java:647)
>> >>>>>         at org.junit.Assert.assertEquals(Assert.java:128)
>> >>>>>         at org.junit.Assert.assertEquals(Assert.java:147)
>> >>>>>         at
>> >>>>>
>> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:259)
>> >>>>>         at
>> >>>>>
>> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
>> >>>>>
>> >>>>> Most likely due to the same reason Crunch requires a special
build
>> of
>> >>>>> HBase 0.94.1, I've found I need to mix and match CDH4 versions
as
>> >>>>> shown by the attached patch.  For the Crunch core build I need
to
>> use
>> >>>>> all of the latest 2.0.0 code but for testing crunch-hbase I
need to
>> >>>>> use the mrv1 fork for hadoop-core and hadoop-minicluster.  I
>> wouldn't
>> >>>>> think that either of those would affect the tests unless somehow
the
>> >>>>> files used for the intermediate states were not being temporarily
>> >>>>> stored correctly.  The fact that the test fails differently
does
>> make
>> >>>>> me wonder about a concurrency issue but I'm not sure where.
>> >>>>>
>> >>>>> Any pointers on debugging would be helpful.
>> >>>>> Micah
>> >>>>>
>> >>>>> On Thu, Jan 24, 2013 at 2:24 PM, Micah Whitacre <
>> mkwhitacre@gmail.com>
>> >>>>> wrote:
>> >>>>> > I am creating an entirely new profile simply to keep my
changes
>> >>>>> > separate from what is in apache/master.
>> >>>>> >
>> >>>>> > Thanks for the hint about the "naive" approach.  Previously
I had
>> the
>> >>>>> > following:
>> >>>>> >
>> >>>>> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>> >>>>> >
>> >>>>> > <hadoop.client.version>2.0.0-mr1-cdh4.1.1</hadoop.client.version>
>> >>>>> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>> >>>>> >
>> >>>>> > If I follow what you did and change it to:
>> >>>>> >
>> >>>>> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>> >>>>> >
>> >>>>> > <hadoop.client.version>2.0.0-cdh4.1.1</hadoop.client.version>
>> >>>>> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>> >>>>> >
>> >>>>> > The build gets farther.  I now have a different failure
in
>> >>>>> > crunch-hbase I'll start working on.
>> >>>>> >
>> >>>>> > Thanks for your help.
>> >>>>> > Micah
>> >>>>> >
>> >>>>> >
>> >>>>> > On Thu, Jan 24, 2013 at 12:23 PM, Josh Wills <jwills@cloudera.com
>> >
>> >>>>> > wrote:
>> >>>>> >> Micah,
>> >>>>> >>
>> >>>>> >> I did the naive thing and just swapped in 2.0.0-cdh4.1.2
for
>> >>>>> >> 2.0.0-alpha in
>> >>>>> >> the crunch.platform=2 profile in the top level POM
and then
>> added in
>> >>>>> >> the
>> >>>>> >> Cloudera repositories. That works for me-- does it
work for you?
>> It
>> >>>>> >> sounds
>> >>>>> >> to me like you're creating an entirely new profile.
>> >>>>> >>
>> >>>>> >> J
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> On Thu, Jan 24, 2013 at 7:58 AM, Micah Whitacre
>> >>>>> >> <mkwhitacre@gmail.com>
>> >>>>> >> wrote:
>> >>>>> >>>
>> >>>>> >>> running dependency:tree on both projects shows
that the version
>> of
>> >>>>> >>> Avro is 1.7.0 for running under both profiles.
 I wish it was
>> that
>> >>>>> >>> easy.  :)
>> >>>>> >>>
>> >>>>> >>> On Thu, Jan 24, 2013 at 9:53 AM, Josh Wills <
>> jwills@cloudera.com>
>> >>>>> >>> wrote:
>> >>>>> >>> >
>> >>>>> >>> >
>> >>>>> >>> >
>> >>>>> >>> > On Thu, Jan 24, 2013 at 6:40 AM, Micah Whitacre
>> >>>>> >>> > <mkwhitacre@gmail.com>
>> >>>>> >>> > wrote:
>> >>>>> >>> >>
>> >>>>> >>> >> Taking a step back and comparing what
is being generated for
>> a
>> >>>>> >>> >> normal
>> >>>>> >>> >> successful test run of "-Dcrunch.platform=2"
I do see a p1
>> and p2
>> >>>>> >>> >> directory being created, with the expected
materialized
>> output
>> >>>>> >>> >> being
>> >>>>> >>> >> in the p1 directory.  So I'm still curious
about tracking
>> all of
>> >>>>> >>> >> the
>> >>>>> >>> >> intermediate state but it doesn't look
like it is an issue
>> with
>> >>>>> >>> >> regard
>> >>>>> >>> >> to creating the output in the wrong directory.
>> >>>>> >>> >
>> >>>>> >>> >
>> >>>>> >>> > That's a relief. :)
>> >>>>> >>> >
>> >>>>> >>> > I think the issue with temp outputs has to
do with our use of
>> the
>> >>>>> >>> > TemporaryPath libraries for creating, well,
temporary paths.
>> We do
>> >>>>> >>> > this
>> >>>>> >>> > so
>> >>>>> >>> > we play nicely with CI frameworks, but you
might need to
>> disable
>> >>>>> >>> > it for
>> >>>>> >>> > investigating intermediate outputs.
>> >>>>> >>> >
>> >>>>> >>> > Re: the specific error you're seeing, that
looks interesting.
>> I
>> >>>>> >>> > wonder
>> >>>>> >>> > if
>> >>>>> >>> > it's an Avro version change or some such thing.
Will see if I
>> can
>> >>>>> >>> > replicate
>> >>>>> >>> > it.
>> >>>>> >>> >
>> >>>>> >>> >
>> >>>>> >>> > --
>> >>>>> >>> > Director of Data Science
>> >>>>> >>> > Cloudera
>> >>>>> >>> > Twitter: @josh_wills
>> >>>>> >>
>> >>>>> >>
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> --
>> >>>>> >> Director of Data Science
>> >>>>> >> Cloudera
>> >>>>> >> Twitter: @josh_wills
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Director of Data Science
>> >>>> Cloudera
>> >>>> Twitter: @josh_wills
>> >>>>
>> >>>> CONFIDENTIALITY NOTICE This message and any included attachments
are
>> from
>> >>>> Cerner Corporation and are intended only for the addressee. The
>> information
>> >>>> contained in this message is confidential and may constitute inside
>> or
>> >>>> non-public information under international, federal, or state
>> securities
>> >>>> laws. Unauthorized forwarding, printing, copying, distribution,
or
>> use of
>> >>>> such information is strictly prohibited and may be unlawful. If
you
>> are not
>> >>>> the addressee, please promptly delete this message and notify the
>> sender of
>> >>>> the delivery error by e-mail or you may call Cerner's corporate
>> offices in
>> >>>> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Director of Data Science
>> >>> Cloudera
>> >>> Twitter: @josh_wills
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Director of Data Science
>> >> Cloudera
>> >> Twitter: @josh_wills
>>
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message