hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: RFC: Major HCatalog refactoring
Date Tue, 03 Sep 2013 19:14:51 GMT
You may have already said this but remind me again. If we go with this
approach, how long until we retired the duplicated code and insist end
users use the new name? 1 release?

A similar debate is likely why the hive classes are still packaged as
org.apache.hadoop.hive, rather then org.apache.hive.


On Tue, Sep 3, 2013 at 2:54 PM, Eugene Koifman <ekoifman@hortonworks.com>wrote:

> We explored the idea you suggest and given the number of APIs (and their
> transitive closure) it would would be very difficult and the result would
> be fragile.  So unfortunately that is not possible.
>
> For example, oldpackage.A has a  method foo() that returns oldpackage.B.
>  You could create
> newpackage.A extends oldpackage.A {
>  @Override
>  newpacage.B foo() {
>  }
> }
>
> which works because of covariant return type, but the implementation of
> foo() becomes problematic because it itself uses other classes.
>
> On Tue, Sep 3, 2013 at 11:41 AM, Edward Capriolo <edlinuxguru@gmail.com
> >wrote:
>
> > I understand.
> >
> > Can we do something like this?
> >
> > oldpackage.HCatologLoader extends newpackage.HCatlogloader { }
> >
> > If we do something like this we don't need to test both classes, it is
> safe
> > to assume they both do the same thing.
> >
> > I understand that we do not want users to have to specify a new class
> name,
> > but 15 minutes of unit tests around a re-name is overkill.
> >
> >
> > On Tue, Sep 3, 2013 at 2:13 PM, Eugene Koifman <ekoifman@hortonworks.com
> > >wrote:
> >
> > > Edward,
> > >
> > > "If a testing framework is truly testing all code paths twice, there
> > > is not much of a win there from a unit/integration tests standpoint. If
> > the
> > > unit tests created more coverage of the code that would be an obvious
> > win.
> > > I have not looked at your patch but from your description it sounds
> like
> > we
> > > are attempting to test a rename that does not sound like a win to me."
> > >
> > > Actually this is not what we are testing.  The package name change (as
> > well
> > > as any changes made in 0.12) will be tested by current tests (which
> will
> > > also change package name).
> > >
> > > The goal of bringing 0.11 version of the source (and corresponding
> tests)
> > > into 0.12 is to ensure that users who use HCatalog from scripts/MR
> jobs,
> > > etc (e.g. a Pig script: A = LOAD 'tablename' USING
> > > org.apache.hcatalog.pig.HCatLoader();)  will not have to update all the
> > > their scripts/programs when upgrading to 0.12.  Having 0.11 tests in
> 0.12
> > > branch ensures that this compatibility layer continues to work while
> HIve
> > > 0.12 and later versions are evolving.
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Sep 3, 2013 at 10:22 AM, Edward Capriolo <
> edlinuxguru@gmail.com
> > > >wrote:
> > >
> > > > I would say a main goal of unit and integration testing is to try all
> > > code
> > > > paths. If a testing framework is truly testing all code paths twice,
> > > there
> > > > is not much of a win there from a unit/integration tests standpoint.
> If
> > > the
> > > > unit tests created more coverage of the code that would be an obvious
> > > win.
> > > > I have not looked at your patch but from your description it sounds
> > like
> > > we
> > > > are attempting to test a rename that does not sound like a win to me.
> > > >
> > > > If the current hcatalog tests run in 15 minutes, you make a change
> and
> > > then
> > > > the run is 30 minutes. 15 minutes is a nice long coffee break, 30
> > minutes
> > > > is a TV show :)
> > > >
> > > > As for the overall hive build taking 10-15 hours. I know that :) I
> used
> > > to
> > > > run them, by hand, on my laptop, because no one would share their
> build
> > > > farm with me. I have heard that Hive consumes the vast majority of
> the
> > > > resources of apache's build farm! I think we need to be good citizens
> > at
> > > > apache and attempt to make this better, not worse.
> > > >
> > > > Now that we have pre-commit builds we can work at a reasonable pace.
> > Now
> > > > that we have this nice pre-commit farm, I do not want to create a
> > > precedent
> > > > that now we can go "nuts", and start down the same slippery slope.
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Sep 3, 2013 at 12:57 PM, Eugene Koifman <
> > > ekoifman@hortonworks.com
> > > > >wrote:
> > > >
> > > > > Current (sequential) run of all hive/hcat unit tests takes 10-15
> > hours.
> > > >  Is
> > > > > another 20-30 minutes that significant?
> > > > >
> > > > > I'm generally wary of unit tests that are not run continuously and
> > > > > automatically.  It delays the detection of problems and then what
> was
> > > > > probably an obvious fix at the time the change was made becomes a
> > long
> > > > > debugging session (often by someone other than whose change broke
> > > > things).
> > > > >  I think this is especially true given how many people are
> > contributing
> > > > to
> > > > > hive.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland <brock@cloudera.com>
> > > wrote:
> > > > >
> > > > > > OK that should be fine.  Though I would echo Edwards sentiment
> > about
> > > > > > adding so much test time. Do these tests have to run each time?
> > Does
> > > > > > it make sense to have an test target such as test-all-hcatalog
> and
> > > > > > then have then run them periodically manually, especially before
> > > > > > releases?
> > > > > >
> > > > > > On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman
> > > > > > <ekoifman@hortonworks.com> wrote:
> > > > > > > These will be new (I.e. 0.11 version) test classes which
will
> be
> > in
> > > > the
> > > > > > old
> > > > > > > org.apache.hcatalog package.  How does that affect the
new
> > > framework?
> > > > > > >
> > > > > > > On Saturday, August 31, 2013, Brock Noland wrote:
> > > > > > >
> > > > > > >> Will these be new Java class files or new test methods
to
> > existing
> > > > > > >> classes?  I am just curious as to how this will play
into the
> > > > > > >> distributed testing framework.
> > > > > > >>
> > > > > > >> On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman
> > > > > > >> <ekoifman@hortonworks.com> wrote:
> > > > > > >> > not quite double but close  (on my Mac that means
it will go
> > up
> > > > from
> > > > > > 35
> > > > > > >> > minutes to 55-60) so in greater scheme of things
it should
> be
> > > > > > negligible
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo
<
> > > > > > edlinuxguru@gmail.com
> > > > > > >> >wrote:
> > > > > > >> >
> > > > > > >> >> By coverage do you mean to say that:
> > > > > > >> >>
> > > > > > >> >> > Thus, the published HCatalog JARs will
contain both
> > packages
> > > > and
> > > > > > the
> > > > > > >> unit
> > > > > > >> >> > tests will cover both versions of the
API.
> > > > > > >> >>
> > > > > > >> >> We are going to double the time of unit tests
for this
> > module?
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >> >> On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman
<
> > > > > > >> ekoifman@hortonworks.com
> > > > > > >> >> >wrote:
> > > > > > >> >>
> > > > > > >> >> > This will change every file under hcatalog
so it has to
> > > happen
> > > > > > before
> > > > > > >> the
> > > > > > >> >> > branching.  Most likely at the beginning
of next week.
> > > > > > >> >> >
> > > > > > >> >> > Thanks
> > > > > > >> >> >
> > > > > > >> >> >
> > > > > > >> >> > On Wed, Aug 28, 2013 at 5:24 PM, Eugene
Koifman <
> > > > > > >> >> ekoifman@hortonworks.com
> > > > > > >> >> > >wrote:
> > > > > > >> >> >
> > > > > > >> >> > > Hi,
> > > > > > >> >> > >
> > > > > > >> >> > >
> > > > > > >> >> > > Here is the plan for refactoring
HCatalog as was agreed
> > to
> > > > when
> > > > > > it
> > > > > > >> was
> > > > > > >> >> > > merged into Hive during.  HIVE-4869
is the umbrella bug
> > for
> > > > > this
> > > > > > >> work.
> > > > > > >> >> >  The
> > > > > > >> >> > > changes are complex and touch every
single file under
> > > > hcatalog.
> > > > > > >>  Please
> > > > > > >> >> > > comment.
> > > > > > >> >> > >
> > > > > > >> >> > > When HCatalog project was merged
into Hive on 0.11
> > several
> > > > > > >> integration
> > > > > > >> >> > > items did not make the 0.11 deadline.
 It was agreed to
> > > > finish
> > > > > > them
> > > > > > >> in
> > > > > > >> >> > 0.12
> > > > > > >> >> > > release.  Specifically:
> > > > > > >> >> > >
> > > > > > >> >> > > 1. HIVE-4895 - change package name
from
> > org.apache.hcatalog
> > > > to
> > > > > > >> >> > > org.apache.hive.hcatalog
> > > > > > >> >> > >
> > > > > > >> >> > > 2. HIVE-4896 - create binary backwards
compatibility
> > layer
> > > > for
> > > > > > hcat
> > > > > > >> >> users
> > > > > > >> >> > > upgrading from 0.11 to 0.12
> > > > > > >> >> > >
> > > > > > >> >> > > For item 1, we’ll just move every
file under
> > > > > org.apache.hcatalog
> > > > > > to
> > > > > > >> >> > > org.apache.hive.hcatalog and update
all “package” and
> > > > “import”
> > > > > > >> >> statement
> > > > > > >> >> > as
> > > > > > >> >> > > well as all hcat/webhcat scripts.
 This will include
> all
> > > > JUnit
> > > > > > >> tests.
> > > > > > >> >> > >
> > > > > > >> >> > > Item 2 will ensure that if a user
has a M/R program or
> > Pig
> > > > > > script,
> > > > > > >> etc.
> > > > > > >> >> > > that uses HCatalog public API, their
programs will
> > continue
> > > > to
> > > > > > work
> > > > > > >> w/o
> > > > > > >> >> > > change with hive 0.12.
> > > > > > >> >> > >
> > > > > > >> >> > > The proposal is to make the changes
that have as little
> > > > impact
> > > > > on
> > > > > > >> the
> > > > > > >> >> > > build system, in part to make upcoming
‘mavenization’
> of
> > > hive
> > > > > > >> easier,
> > > > > > >> >> in
> > > > > > >> >> > > part to make the changes more manageable.
> > > > > > >> >> > >
> > > > > > >> >> > >
> > > > > > >> >> > >
> > > > > > >> >> > > The list of public interfaces (and
their transitive
> > > closure)
> > > > > for
> > > > > > >> which
> > > > > > >> >> > > backwards compat will be provided.
> > > > > > >> >> > >
> > > > > > >> >> > >    1.
> > > > > > >> >> > >
> > > > > > >> >> > >    HCatLoader
> > > > > > >> >> > >    2.
> > > > > > >> >> > >
> > > > > > >> >> > >    HCatStorer
> > > > > > >> >> > >    3.
> > > > > > >> >> > >
> > > > > > >> >> > >    HCatInputFormat
> > > > > > >> >> > >    4.
> > > > > > >> >> > >
> > > > > > >> >> > >    HCatOutputFormat
> > > > > > >> >> > >    5.
> > > > > > >> >> > >
> > > > > > >> >> > >    HCatReader
> > > > > > >> >> > >    6.
> > > > > > >> >> > >
> > > > > > >> >> > >    HCatWriter
> > > > > > >> >> > >    7.
> > > > > > >> >> > >
> > > > > > >> >> > >    HCatRecord
> > > > > > >> >> > >    8.
> > > > > > >> >> > >
> > > > > > >> >> > >    HCatSchema
> > > > > > >> >> > >
> > > > > > >> >> > >
> > > > > > >> >> > > To achieve this, 0.11 version of
these classes will be
> > > added
> > > > in
> > > > > > >> >> > > org.apache.hcatalog package (after
item 1 is done).
>  Each
> > > of
> > > > > > these
> > > > > > >> >> > classes
> > > > > > >> --
> > > > > > >> Apache MRUnit - Unit testing MapReduce -
> > http://mrunit.apache.org
> > > > > > >>
> > > > > > >
> > > > > > > --
> > > > > > > CONFIDENTIALITY NOTICE
> > > > > > > NOTICE: This message is intended for the use of the individual
> or
> > > > > entity
> > > > > > to
> > > > > > > which it is addressed and may contain information that
is
> > > > confidential,
> > > > > > > privileged and exempt from disclosure under applicable
law. If
> > the
> > > > > reader
> > > > > > > of this message is not the intended recipient, you are
hereby
> > > > notified
> > > > > > that
> > > > > > > any printing, copying, dissemination, distribution, disclosure
> or
> > > > > > > forwarding of this communication is strictly prohibited.
If you
> > > have
> > > > > > > received this communication in error, please contact the
sender
> > > > > > immediately
> > > > > > > and delete it from your system. Thank You.
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Apache MRUnit - Unit testing MapReduce -
> http://mrunit.apache.org
> > > > > >
> > > > >
> > > > > --
> > > > > CONFIDENTIALITY NOTICE
> > > > > NOTICE: This message is intended for the use of the individual or
> > > entity
> > > > to
> > > > > which it is addressed and may contain information that is
> > confidential,
> > > > > privileged and exempt from disclosure under applicable law. If the
> > > reader
> > > > > of this message is not the intended recipient, you are hereby
> > notified
> > > > that
> > > > > any printing, copying, dissemination, distribution, disclosure or
> > > > > forwarding of this communication is strictly prohibited. If you
> have
> > > > > received this communication in error, please contact the sender
> > > > immediately
> > > > > and delete it from your system. Thank You.
> > > > >
> > > >
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> > >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message