hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugene Koifman <ekoif...@hortonworks.com>
Subject Re: RFC: Major HCatalog refactoring
Date Tue, 03 Sep 2013 18:54:46 GMT
We explored the idea you suggest and given the number of APIs (and their
transitive closure) it would would be very difficult and the result would
be fragile.  So unfortunately that is not possible.

For example, oldpackage.A has a  method foo() that returns oldpackage.B.
 You could create
newpackage.A extends oldpackage.A {
 @Override
 newpacage.B foo() {
 }
}

which works because of covariant return type, but the implementation of
foo() becomes problematic because it itself uses other classes.

On Tue, Sep 3, 2013 at 11:41 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> I understand.
>
> Can we do something like this?
>
> oldpackage.HCatologLoader extends newpackage.HCatlogloader { }
>
> If we do something like this we don't need to test both classes, it is safe
> to assume they both do the same thing.
>
> I understand that we do not want users to have to specify a new class name,
> but 15 minutes of unit tests around a re-name is overkill.
>
>
> On Tue, Sep 3, 2013 at 2:13 PM, Eugene Koifman <ekoifman@hortonworks.com
> >wrote:
>
> > Edward,
> >
> > "If a testing framework is truly testing all code paths twice, there
> > is not much of a win there from a unit/integration tests standpoint. If
> the
> > unit tests created more coverage of the code that would be an obvious
> win.
> > I have not looked at your patch but from your description it sounds like
> we
> > are attempting to test a rename that does not sound like a win to me."
> >
> > Actually this is not what we are testing.  The package name change (as
> well
> > as any changes made in 0.12) will be tested by current tests (which will
> > also change package name).
> >
> > The goal of bringing 0.11 version of the source (and corresponding tests)
> > into 0.12 is to ensure that users who use HCatalog from scripts/MR jobs,
> > etc (e.g. a Pig script: A = LOAD 'tablename' USING
> > org.apache.hcatalog.pig.HCatLoader();)  will not have to update all the
> > their scripts/programs when upgrading to 0.12.  Having 0.11 tests in 0.12
> > branch ensures that this compatibility layer continues to work while HIve
> > 0.12 and later versions are evolving.
> >
> >
> >
> >
> >
> > On Tue, Sep 3, 2013 at 10:22 AM, Edward Capriolo <edlinuxguru@gmail.com
> > >wrote:
> >
> > > I would say a main goal of unit and integration testing is to try all
> > code
> > > paths. If a testing framework is truly testing all code paths twice,
> > there
> > > is not much of a win there from a unit/integration tests standpoint. If
> > the
> > > unit tests created more coverage of the code that would be an obvious
> > win.
> > > I have not looked at your patch but from your description it sounds
> like
> > we
> > > are attempting to test a rename that does not sound like a win to me.
> > >
> > > If the current hcatalog tests run in 15 minutes, you make a change and
> > then
> > > the run is 30 minutes. 15 minutes is a nice long coffee break, 30
> minutes
> > > is a TV show :)
> > >
> > > As for the overall hive build taking 10-15 hours. I know that :) I used
> > to
> > > run them, by hand, on my laptop, because no one would share their build
> > > farm with me. I have heard that Hive consumes the vast majority of the
> > > resources of apache's build farm! I think we need to be good citizens
> at
> > > apache and attempt to make this better, not worse.
> > >
> > > Now that we have pre-commit builds we can work at a reasonable pace.
> Now
> > > that we have this nice pre-commit farm, I do not want to create a
> > precedent
> > > that now we can go "nuts", and start down the same slippery slope.
> > >
> > >
> > >
> > >
> > > On Tue, Sep 3, 2013 at 12:57 PM, Eugene Koifman <
> > ekoifman@hortonworks.com
> > > >wrote:
> > >
> > > > Current (sequential) run of all hive/hcat unit tests takes 10-15
> hours.
> > >  Is
> > > > another 20-30 minutes that significant?
> > > >
> > > > I'm generally wary of unit tests that are not run continuously and
> > > > automatically.  It delays the detection of problems and then what was
> > > > probably an obvious fix at the time the change was made becomes a
> long
> > > > debugging session (often by someone other than whose change broke
> > > things).
> > > >  I think this is especially true given how many people are
> contributing
> > > to
> > > > hive.
> > > >
> > > >
> > > >
> > > > On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland <brock@cloudera.com>
> > wrote:
> > > >
> > > > > OK that should be fine.  Though I would echo Edwards sentiment
> about
> > > > > adding so much test time. Do these tests have to run each time?
> Does
> > > > > it make sense to have an test target such as test-all-hcatalog and
> > > > > then have then run them periodically manually, especially before
> > > > > releases?
> > > > >
> > > > > On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman
> > > > > <ekoifman@hortonworks.com> wrote:
> > > > > > These will be new (I.e. 0.11 version) test classes which will
be
> in
> > > the
> > > > > old
> > > > > > org.apache.hcatalog package.  How does that affect the new
> > framework?
> > > > > >
> > > > > > On Saturday, August 31, 2013, Brock Noland wrote:
> > > > > >
> > > > > >> Will these be new Java class files or new test methods to
> existing
> > > > > >> classes?  I am just curious as to how this will play into
the
> > > > > >> distributed testing framework.
> > > > > >>
> > > > > >> On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman
> > > > > >> <ekoifman@hortonworks.com> wrote:
> > > > > >> > not quite double but close  (on my Mac that means it
will go
> up
> > > from
> > > > > 35
> > > > > >> > minutes to 55-60) so in greater scheme of things it
should be
> > > > > negligible
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo <
> > > > > edlinuxguru@gmail.com
> > > > > >> >wrote:
> > > > > >> >
> > > > > >> >> By coverage do you mean to say that:
> > > > > >> >>
> > > > > >> >> > Thus, the published HCatalog JARs will contain
both
> packages
> > > and
> > > > > the
> > > > > >> unit
> > > > > >> >> > tests will cover both versions of the API.
> > > > > >> >>
> > > > > >> >> We are going to double the time of unit tests for
this
> module?
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman
<
> > > > > >> ekoifman@hortonworks.com
> > > > > >> >> >wrote:
> > > > > >> >>
> > > > > >> >> > This will change every file under hcatalog
so it has to
> > happen
> > > > > before
> > > > > >> the
> > > > > >> >> > branching.  Most likely at the beginning of
next week.
> > > > > >> >> >
> > > > > >> >> > Thanks
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > On Wed, Aug 28, 2013 at 5:24 PM, Eugene Koifman
<
> > > > > >> >> ekoifman@hortonworks.com
> > > > > >> >> > >wrote:
> > > > > >> >> >
> > > > > >> >> > > Hi,
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > > Here is the plan for refactoring HCatalog
as was agreed
> to
> > > when
> > > > > it
> > > > > >> was
> > > > > >> >> > > merged into Hive during.  HIVE-4869 is
the umbrella bug
> for
> > > > this
> > > > > >> work.
> > > > > >> >> >  The
> > > > > >> >> > > changes are complex and touch every single
file under
> > > hcatalog.
> > > > > >>  Please
> > > > > >> >> > > comment.
> > > > > >> >> > >
> > > > > >> >> > > When HCatalog project was merged into
Hive on 0.11
> several
> > > > > >> integration
> > > > > >> >> > > items did not make the 0.11 deadline.
 It was agreed to
> > > finish
> > > > > them
> > > > > >> in
> > > > > >> >> > 0.12
> > > > > >> >> > > release.  Specifically:
> > > > > >> >> > >
> > > > > >> >> > > 1. HIVE-4895 - change package name from
> org.apache.hcatalog
> > > to
> > > > > >> >> > > org.apache.hive.hcatalog
> > > > > >> >> > >
> > > > > >> >> > > 2. HIVE-4896 - create binary backwards
compatibility
> layer
> > > for
> > > > > hcat
> > > > > >> >> users
> > > > > >> >> > > upgrading from 0.11 to 0.12
> > > > > >> >> > >
> > > > > >> >> > > For item 1, we’ll just move every file
under
> > > > org.apache.hcatalog
> > > > > to
> > > > > >> >> > > org.apache.hive.hcatalog and update all
“package” and
> > > “import”
> > > > > >> >> statement
> > > > > >> >> > as
> > > > > >> >> > > well as all hcat/webhcat scripts.  This
will include all
> > > JUnit
> > > > > >> tests.
> > > > > >> >> > >
> > > > > >> >> > > Item 2 will ensure that if a user has
a M/R program or
> Pig
> > > > > script,
> > > > > >> etc.
> > > > > >> >> > > that uses HCatalog public API, their
programs will
> continue
> > > to
> > > > > work
> > > > > >> w/o
> > > > > >> >> > > change with hive 0.12.
> > > > > >> >> > >
> > > > > >> >> > > The proposal is to make the changes that
have as little
> > > impact
> > > > on
> > > > > >> the
> > > > > >> >> > > build system, in part to make upcoming
‘mavenization’ of
> > hive
> > > > > >> easier,
> > > > > >> >> in
> > > > > >> >> > > part to make the changes more manageable.
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > > The list of public interfaces (and their
transitive
> > closure)
> > > > for
> > > > > >> which
> > > > > >> >> > > backwards compat will be provided.
> > > > > >> >> > >
> > > > > >> >> > >    1.
> > > > > >> >> > >
> > > > > >> >> > >    HCatLoader
> > > > > >> >> > >    2.
> > > > > >> >> > >
> > > > > >> >> > >    HCatStorer
> > > > > >> >> > >    3.
> > > > > >> >> > >
> > > > > >> >> > >    HCatInputFormat
> > > > > >> >> > >    4.
> > > > > >> >> > >
> > > > > >> >> > >    HCatOutputFormat
> > > > > >> >> > >    5.
> > > > > >> >> > >
> > > > > >> >> > >    HCatReader
> > > > > >> >> > >    6.
> > > > > >> >> > >
> > > > > >> >> > >    HCatWriter
> > > > > >> >> > >    7.
> > > > > >> >> > >
> > > > > >> >> > >    HCatRecord
> > > > > >> >> > >    8.
> > > > > >> >> > >
> > > > > >> >> > >    HCatSchema
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > > To achieve this, 0.11 version of these
classes will be
> > added
> > > in
> > > > > >> >> > > org.apache.hcatalog package (after item
1 is done).  Each
> > of
> > > > > these
> > > > > >> >> > classes
> > > > > >> --
> > > > > >> Apache MRUnit - Unit testing MapReduce -
> http://mrunit.apache.org
> > > > > >>
> > > > > >
> > > > > > --
> > > > > > CONFIDENTIALITY NOTICE
> > > > > > NOTICE: This message is intended for the use of the individual
or
> > > > entity
> > > > > to
> > > > > > which it is addressed and may contain information that is
> > > confidential,
> > > > > > privileged and exempt from disclosure under applicable law.
If
> the
> > > > reader
> > > > > > of this message is not the intended recipient, you are hereby
> > > notified
> > > > > that
> > > > > > any printing, copying, dissemination, distribution, disclosure
or
> > > > > > forwarding of this communication is strictly prohibited. If
you
> > have
> > > > > > received this communication in error, please contact the sender
> > > > > immediately
> > > > > > and delete it from your system. Thank You.
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
> > > > >
> > > >
> > > > --
> > > > CONFIDENTIALITY NOTICE
> > > > NOTICE: This message is intended for the use of the individual or
> > entity
> > > to
> > > > which it is addressed and may contain information that is
> confidential,
> > > > privileged and exempt from disclosure under applicable law. If the
> > reader
> > > > of this message is not the intended recipient, you are hereby
> notified
> > > that
> > > > any printing, copying, dissemination, distribution, disclosure or
> > > > forwarding of this communication is strictly prohibited. If you have
> > > > received this communication in error, please contact the sender
> > > immediately
> > > > and delete it from your system. Thank You.
> > > >
> > >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message