hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "kulkarni.swarnim@gmail.com" <kulkarni.swar...@gmail.com>
Subject Re: [Discuss] project chop up
Date Wed, 07 Aug 2013 19:55:50 GMT
> I'd like to propose we move towards Maven.

Big +1 on this. Most of the major apache projects(hadoop, hbase, avro etc.)
are maven based.

Also can't agree more that the current build system is frustrating to say
the least. Another issue I had with the existing ant based system is that
there are no checkpointing capabilities[1]. So if a 6 hour build fails
after 5hr 30 minutes, most of the things even though successful have to be
rebuilt which is very time consuming. Maven reactors have inbuilt support
for lot of this stuff.

[1] https://issues.apache.org/jira/browse/HIVE-3449.


On Wed, Aug 7, 2013 at 2:06 PM, Brock Noland <brock@cloudera.com> wrote:

> Thus far there hasn't been any dissent to managing our modules with maven.
>  In addition there have been several comments positive on a move towards
> maven. I'd like to add Ivy seems to have issues managing multiple versions
> of libraries. For example in HIVE-3632 Ivy cache had to be cleared when
> testing patches that installed the new version of DataNucleus  I have had
> the same issue on HIVE-4388. Requiring the deletion of the ivy cache
> is extremely painful for developers that don't have access to high
> bandwidth connections or live in areas far from California where most of
> these jars are hosted.
>
> I'd like to propose we move towards Maven.
>
>
> On Sat, Jul 27, 2013 at 1:19 PM, Mohammad Islam <mislam77@yahoo.com>
> wrote:
>
> >
> >
> > Yes hive build and test cases got convoluted as the project scope
> > gradually increased. This is the time to take action!
> >
> > Based on my other Apache experiences, I prefer the option #3 "Breakup the
> > projects within our own source tree". Make multiple modules or
> > sub-projects. By default, only key modules will be built.
> >
> > Maven could be a possible candidate.
> >
> > Regards,
> > Mohammad
> >
> >
> >
> > ________________________________
> >  From: Edward Capriolo <edlinuxguru@gmail.com>
> > To: "dev@hive.apache.org" <dev@hive.apache.org>
> > Sent: Saturday, July 27, 2013 7:03 AM
> > Subject: Re: [Discuss] project chop up
> >
> >
> > Or feel free to suggest different approach. I am used to managing
> software
> > as multi-module maven projects.
> > From a development standpoint if I was working on beeline, it would be
> nice
> > to only require some of the sub-projects to be open in my IDE to do that.
> > Also managing everything globally is not ideal.
> >
> > Hive's project layout, build, and test infrastructure is just funky. It
> has
> > to do a few interesting things (shims, testing), but I do not think what
> we
> > are doing justifies the massive ant build system we have. Ant is so ten
> > years ago.
> >
> >
> >
> > On Sat, Jul 27, 2013 at 12:04 AM, Alan Gates <gates@hortonworks.com>
> > wrote:
> >
> > > But I assume they'd still be a part of targets like package, tar, and
> > > binary?  Making them compile and test separately and explicitly load
> the
> > > core Hive jars from maven/ivy seems reasonable.
> > >
> > > Alan.
> > >
> > > On Jul 26, 2013, at 8:40 PM, Brock Noland wrote:
> > >
> > > > Hi,
> > > >
> > > > I think thats part of it but I'd like to decouple the downstream
> > projects
> > > > even further so that the only connection is the dependency on the
> hive
> > > jars.
> > > >
> > > > Brock
> > > > On Jul 26, 2013 10:10 PM, "Alan Gates" <gates@hortonworks.com>
> wrote:
> > > >
> > > >> I'm not sure how this is different from what hcat does today.  It
> > needs
> > > >> Hive's jars to compile, so it's one of the last things in the
> compile
> > > step.
> > > >> Would moving the other modules you note to be in the same category
> be
> > > >> enough?  Did you want to also make it so that the default ant target
> > > >> doesn't compile those?
> > > >>
> > > >> Alan.
> > > >>
> > > >> On Jul 26, 2013, at 4:09 PM, Edward Capriolo wrote:
> > > >>
> > > >>> My mistake on saying hcat was a fork metastore. I had a brain
fart
> > for
> > > a
> > > >>> moment.
> > > >>>
> > > >>> One way we could do this is create a folder called downstream.
In
> our
> > > >>> release step we can execute the downstream builds and then copy
the
> > > files
> > > >>> we need back. So nothing downstream will be on the classpath of
the
> > > main
> > > >>> project.
> > > >>>
> > > >>> This could help us breakup ql as well. Things like exotic file
> > formats
> > > ,
> > > >>> and things that are pluggable like zk locking can go here. That
> might
> > > be
> > > >>> overkill.
> > > >>>
> > > >>> For now we can focus on building downstream and hivethrift1might
be
> > the
> > > >>> first thing to try to downstream.
> > > >>>
> > > >>>
> > > >>> On Friday, July 26, 2013, Thejas Nair <thejas@hortonworks.com>
> > wrote:
> > > >>>> +1 to the idea of making the build of core hive and other
> downstream
> > > >>>> components independent.
> > > >>>>
> > > >>>> bq.  I was under the impression that Hcat and hive-metastore
was
> > > >>>> supposed to merge up somehow.
> > > >>>>
> > > >>>> The metastore code was never forked. Hcat was just using
> > > >>>> hive-metastore and making the metadata available to rest of
hadoop
> > > >>>> (pig, java MR..).
> > > >>>> A lot of the changes that were driven by hcat goals were being
> made
> > in
> > > >>>> hive-metastore. You can think of hcat as set of libraries
that let
> > pig
> > > >>>> and java MR use hive metastore. Since hcat is closely tied
to
> > > >>>> hive-metastore, it makes sense to have them in same project.
> > > >>>>
> > > >>>>
> > > >>>> On Fri, Jul 26, 2013 at 6:33 AM, Edward Capriolo <
> > > edlinuxguru@gmail.com
> > > >>>
> > > >>> wrote:
> > > >>>>> Also i believe hcatalog web can fall into the same designation.
> > > >>>>>
> > > >>>>> Question , hcatalog was initily a big hive-metastore fork.
I was
> > > under
> > > >>> the
> > > >>>>> impression that Hcat and hive-metastore was supposed to
merge up
> > > >> somehow.
> > > >>>>> What is the status on that? I remember that was one of
the core
> > > reasons
> > > >>> we
> > > >>>>> brought it in.
> > > >>>>>
> > > >>>>> On Friday, July 26, 2013, Edward Capriolo <edlinuxguru@gmail.com
> >
> > > >> wrote:
> > > >>>>>> I prefer option 3 as well.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Fri, Jul 26, 2013 at 12:52 AM, Brock Noland <
> > brock@cloudera.com>
> > > >>> wrote:
> > > >>>>>>>
> > > >>>>>>> On Thu, Jul 25, 2013 at 9:48 PM, Edward Capriolo
<
> > > >> edlinuxguru@gmail.com
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> I have been developing my laptop on a duel
core 2 GB Ram
> laptop
> > > for
> > > >>>>> years
> > > >>>>>>>> now. With the addition of hcatalog, hive-thrift2,
and some
> other
> > > >>> growth
> > > >>>>>>>> trying to develop hive in a eclipse on this
machine craws,
> > > >> especially
> > > >>>>> if
> > > >>>>>>>> 'build automatically' is turned on. As we
look to add on more
> > > things
> > > >>>>> this
> > > >>>>>>>> is only going to get worse.
> > > >>>>>>>>
> > > >>>>>>>> I am also noticing issues like this:
> > > >>>>>>>>
> > > >>>>>>>> https://issues.apache.org/jira/browse/HIVE-4849
> > > >>>>>>>>
> > > >>>>>>>> What I think we should do is strip down/out
optional parts of
> > > hive.
> > > >>>>>>>>
> > > >>>>>>>> 1) Hive Hbase
> > > >>>>>>>> This should really be it's own project to
do this right we
> > really
> > > >>>>> have to
> > > >>>>>>>> have multiple branches since hbase is not
backwards
> compatible.
> > > >>>>>>>>
> > > >>>>>>>> 2) Hive Web Interface
> > > >>>>>>>> Now really a big project but not really critical
can be just
> as
> > > >>> easily
> > > >>>>> be
> > > >>>>>>>> build separately
> > > >>>>>>>>
> > > >>>>>>>> 3) hive thrift 1
> > > >>>>>>>> We have hive thrift 2 now, it is time for
the sun to set on
> > > >>>>> hivethrift1,
> > > >>>>>>>>
> > > >>>>>>>> 4) odbc
> > > >>>>>>>> Not entirely convinced about this one but
it is really not
> > > critical
> > > >>> to
> > > >>>>>>>> running hive.
> > > >>>>>>>>
> > > >>>>>>>> What I think we should do is create sub-projects
for the above
> > > >> things
> > > >>>>> or
> > > >>>>>>>> simply move them into directories that do
not build with hive.
> > > >>> Ideally
> > > >>>>> they
> > > >>>>>>>> would use maven to pull dependencies.
> > > >>>>>>>>
> > > >>>>>>>> What does everyone think?
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> I agree that projects like the HBase handler and
probably
> others
> > as
> > > >>> well
> > > >>>>>>> should somehow be "downstream" projects which
simply depend on
> > the
> > > >> hive
> > > >>>>>>> jars.  I see a couple alternatives for this:
> > > >>>>>>>
> > > >>>>>>> * Take the "module" in question to the Apache
Incubator
> > > >>>>>>> * Move the "module" in question to the Apache
Extras
> > > >>>>>>> * Breakup the projects within our own source tree
> > > >>>>>>>
> > > >>>>>>> I'd prefer the third option at this point.
> > > >>>>>>>
> > > >>>>>>> Brock
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Brock
> > > >>>>>>
> > > >>>>>>
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>



-- 
Swarnim

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message