hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@apache.org>
Subject Re: [Discuss] project chop up
Date Tue, 20 Aug 2013 21:31:05 GMT
This is awesome!

Thanks,
+Vinod

On Aug 20, 2013, at 7:28 AM, Edward Capriolo wrote:

> Just an update. This is going very well:
> 
> NFO] Nothing to compile - all classes are up to date
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache Hive ....................................... SUCCESS [0.002s]
> [INFO] hive-shims-x ...................................... SUCCESS [1.210s]
> [INFO] hive-shims-20 ..................................... SUCCESS [0.125s]
> [INFO] hive-common ....................................... SUCCESS [0.082s]
> [INFO] hive-serde ........................................ SUCCESS [2.521s]
> [INFO] hive-metastore .................................... SUCCESS [10.818s]
> [INFO] hive-exec ......................................... SUCCESS [4.521s]
> [INFO] hive-avro ......................................... SUCCESS [1.582s]
> [INFO] hive-zookeeper .................................... SUCCESS [0.519s]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 21.613s
> [INFO] Finished at: Tue Aug 20 10:23:34 EDT 2013
> [INFO] Final Memory: 39M/408M
> 
> 
> Though I did some short cuts and disabled some tests. We can build hive
> very fast, including incremental builds. Also we are using maven plugins to
> compile antlr, thrift, protobuf, datanucleas and building those every time.
> 
> 
> On Fri, Aug 16, 2013 at 11:16 PM, Xuefu Zhang <xzhang@cloudera.com> wrote:
> 
>> Thanks, Edward.
>> 
>> I'm big +1 to mavenize Hive. Hive has long reached a point where it's hard
>> to manage its build using ant. I'd like to help on this too.
>> 
>> Thanks,
>> Xuefu
>> 
>> 
>> On Fri, Aug 16, 2013 at 7:31 PM, Edward Capriolo <edlinuxguru@gmail.com
>>> wrote:
>> 
>>> For those interested in pitching in.
>>> https://github.com/edwardcapriolo/hive
>>> 
>>> 
>>> 
>>> On Fri, Aug 16, 2013 at 11:58 AM, Edward Capriolo <edlinuxguru@gmail.com
>>>> wrote:
>>> 
>>>> Summary from hive-irc channel. Minor edits for spell check/grammar.
>>>> 
>>>> The last 10 lines are a summary of the key points.
>>>> 
>>>> [10:59:17] <ecapriolo> noland: et all. Do you want to talk about hive
>> in
>>>> maven?
>>>> [11:01:06] smonchi [~
>>>> ronin@host34-189-dynamic.23-79-r.retail.telecomitalia.it] has quit
>> IRC:
>>>> Quit: ... 'cause there is no patch for human stupidity ...
>>>> [11:10:04] <noland> ecapriolo: yeah that sounds good to me!
>>>> [11:10:22] <noland> I saw you created the jira but haven't had time
to
>>> look
>>>> [11:10:32] <ecapriolo> So I found a few things
>>>> [11:10:49] <ecapriolo> In common there is one or two testats that
>>> actually
>>>> fork a process :)
>>>> [11:10:56] <ecapriolo> and use build.test.resources
>>>> [11:11:12] <ecapriolo> Some serde, uses some methods from ql in testing
>>>> [11:11:27] <ecapriolo> and shims really needs a separate hadoop test
>> shim
>>>> [11:11:32] <ecapriolo> But that is all simple stuff
>>>> [11:11:47] <ecapriolo> The biggest problem is I do not know how to
>> solve
>>>> shims with maven
>>>> [11:11:50] <ecapriolo> do you have any ideas
>>>> [11:11:52] <ecapriolo> ?
>>>> [11:13:00] <noland> That one is going to be a challenge. It might be
>> that
>>>> in that section we have to drop down to ant
>>>> [11:14:44] <noland> Is it a requirement that we build both the .20
and
>>> .23
>>>> shims for a "package" as we do today?
>>>> [11:16:46] <ecapriolo> I was thinking we can do it like a JDBC driver
>>>> [11:16:59] <ecapriolo> Se separate out the interface of shims
>>>> [11:17:22] <ecapriolo> And then at runtime we drop in a driver
>>> implementing
>>>> [11:17:34] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC: Remote
>> host
>>>> closed the connection
>>>> [11:17:36] <ecapriolo> That or we could use maven's profile system
>>>> [11:18:09] <ecapriolo> It seems that everything else can actually link
>>>> against hadoop-0.20.2 as a provided dependency
>>>> [11:18:37] <noland> Yeah either would work. The driver method would
>>>> probably require use to use ant build both the drivers?
>>>> [11:18:44] <noland> I am a fan of mvn profiles
>>>> [11:19:05] <ecapriolo> I was thinking we kinda separate the shim out
>> into
>>>> its own project,, not a module
>>>> [11:19:10] <ecapriolo> to achive that jdbc thing
>>>> [11:19:27] <ecapriolo> But I do not have a solution yet, I was looking
>> to
>>>> farm that out to someone smart...like you :)
>>>> [11:19:33] <noland> :)
>>>> [11:19:47] <ecapriolo> All I know is that we need a test shim because
>>>> HadoopShim requires hadoop-test jars
>>>> [11:20:10] <ecapriolo> then the Mini stuff is only used in qtest anyway
>>>> [11:20:48] <ecapriolo> Is this something you want to help with? I was
>>>> thinking of spinning up a github
>>>> [11:20:50] <noland> I think that the separate projects would work and
>>>> perhaps nicely.
>>>> [11:21:01] <noland> Yeah I'd be interested in helping!
>>>> [11:21:17] <noland> But I am going on vacation starting next week for
>>>> about 10 days
>>>> [11:21:27] <ecapriolo> Ah cool where are you going?
>>>> [11:21:37] <noland> Netherlands
>>>> [11:21:42] <noland> Biking around and such
>>>> [11:23:52] <noland> The one thing I was thinking about with regards
to
>> a
>>>> branch is keeping history. We'll want to keep history for the files but
>>>> AFAICT svn doesn't understand git mv.
>>>> [11:24:16] Wertax [~wertax@wolfkamp.xs4all.nl] has joined #hive
>>>> [11:31:19] jeromatron [~textual@host90-152-1-162.ipv4.regusnet.com]
>> has
>>>> quit IRC: Quit: My MacBook Pro has gone to sleep. ZZZzzz…
>>>> [11:35:49] <ecapriolo> noland: Right I do not play to suggest that
we
>>> will
>>>> do this in git
>>>> [11:36:11] <ecapriolo> I just see that we are going to have to hack
>> stuff
>>>> up and it is not the type of work that lends itself well to branches.
>>>> [11:36:17] <noland> Ahh ok
>>>> [11:36:56] <ecapriolo> Once we come up with a solution for the shims,
>> and
>>>> we have something that can reasonably build and test hive we can figure
>>> out
>>>> how to apply that to a branch/trunk
>>>> [11:36:58] <noland> yeah so just do a POC on github and then implement
>> on
>>>> svn
>>>> [11:37:05] <noland> cool
>>>> [11:37:29] <ecapriolo> Along the way we can probably find things that
>> we
>>>> can do like that common test I found and other minor things
>>>> [11:37:41] <noland> sounds good
>>>> [11:37:50] <ecapriolo> Those we can likely just commit into the current
>>>> trunk and I will file issues for those now
>>>> [11:37:58] <noland> cool
>>>> [11:38:41] <ecapriolo> But yea man. I just cant take the project as
it
>> is
>>>> now
>>>> [11:38:51] <ecapriolo> in eclipse everytime I touch a file it rebuilds
>>>> everything!
>>>> [11:38:53] <ecapriolo> Its like WTF
>>>> [11:39:09] <ecapriolo> Running one tests takes like 3 minutes
>>>> [11:39:12] <ecapriolo> its out of control
>>>> [11:39:23] <noland> LOL
>>>> [11:39:29] <noland> I agree 110%
>>>> [11:39:32] <ecapriolo> eclipse was not always like that I am not sure
>> how
>>>> the hell it happened
>>>> [11:39:51] <noland> The eclipse sep thing is so harmful
>>>> [11:40:08] <noland> dep thing that is
>>>> [11:40:12] <ecapriolo> I mean command line ant was always bad, but
you
>>>> used to be able to work in eclipse without having to rebuild everything
>>>> every change/test
>>>> [11:40:39] <noland> Yeah the first thing I do these days is disable
the
>>>> ant builder
>>>> [11:40:52] <ecapriolo> Ow... I did not really know that was a thing
>>>> [11:40:55] <noland> it starts compiling while you are still working
and
>>>> blocks for minutes
>>>> [11:41:02] <ecapriolo> Right that is what I mean
>>>> [11:41:11] <ecapriolo> Everyone has like 10 hacks to work on the
>> project
>>>> [11:41:14] <noland> yeah you can remove it in project…one sec
>>>> [11:41:17] <ecapriolo> perm gen
>>>> [11:41:20] <ecapriolo> ant builder
>>>> [11:41:32] <noland> project -> properties -> builders
>>>> [11:41:34] <ecapriolo> hive does not build offline anymore
>>>> [11:41:37] <noland> yeah
>>>> [11:41:47] <ecapriolo> Im not sure when this stuff went bad, but it
has
>>>> gotten really really bad
>>>> [11:42:09] <ecapriolo> Also what I plan on doing is stripping out
>>>> non-essentials
>>>> [11:42:25] <ecapriolo> like serde has all this thrift and avro stuff
to
>>>> support custom formats
>>>> [11:42:30] <ecapriolo> that is going into its own module
>>>> [11:42:43] <ecapriolo> Going to rip out all the udfs accept between
and
>>> or.
>>>> [11:43:50] <noland> yeah it'd be nice to have those items in their
own
>>>> modules so you can just build/test them when you want
>>>> [11:44:12] <ecapriolo> hbase zookeeper locking
>>>> [11:44:31] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC: Remote
>> host
>>>> closed the connection
>>>> [11:44:44] <noland> yeah for sure
>>>> [11:45:04] <noland> I think the default for testing should be the in
>>>> process locking
>>>> [11:45:10] <ecapriolo> Absolutely.
>>>> [11:45:40] <ecapriolo> The other issue I want to tackle is
>> hive-exec.jar
>>>> [11:45:54] <ecapriolo> I want to jar-jar all the dependencies.
>>>> [11:46:46] <ecapriolo> I run into to many conflicts with log4j and
>> guava,
>>>> and commons-utils all those things need to be packaged into
>>> non-conflicting
>>>> packages
>>>> [11:46:58] <noland> I haven't looked at how we build that yet but I
>> agree
>>>> it'd be nice if we could jar-jar things like guava
>>>> [11:47:12] <noland> so we can actually use them on server side
>>>> [11:47:16] <ecapriolo> We dont really need quava. its probably just
>> used
>>>> for one tiny thing
>>>> [11:47:43] <ecapriolo> People are forgetting/do not understand that
>>>> hive-exec needs to get sent via the distributed cache
>>>> [11:47:57] <noland> Wen we implement range joins they have a RangeMap
>>> that
>>>> we'll need.
>>>> [11:47:57] <ecapriolo> so making it hulkingly fat just slows everything
>>>> down
>>>> [11:48:11] <noland> Do we ship it every time?
>>>> [11:48:25] <noland> Cause we only have to ship it once per version
of
>> the
>>>> jar.
>>>> [11:48:42] <ecapriolo> Recently you need the jackson jars on the auxlib
>>> as
>>>> well
>>>> [11:48:46] <ecapriolo> hive will not work without it
>>>> [11:49:11] <ecapriolo> People are just focused
>>>> feature-feature-feature...bigger...bigger bigger
>>>> [11:49:24] rubensayshi [drakie@nat/hyves.nl/x-uxywnflkbberbzhq] has
>> quit
>>>> IRC: Quit: Leaving
>>>> [11:49:27] <noland> yeah maven modules will definitely help us
>> understand
>>>> who depends on what.
>>>> [11:49:28] <ecapriolo> Next up kyro
>>>> [11:49:51] <noland> I agree there is a lot of tech debt that needs
>> paying
>>>> [11:50:30] <ecapriolo> So those are all the high level things I want
to
>>>> tackle
>>>> [11:50:59] <ecapriolo> shims, general cleanup, break out non-essential
>>>> code, build a better non conflicting hive-exec jar
>>>> [11:51:10] <noland> That sounds good. Once we hack on github for a
>> while
>>>> it'd be nice to develop a brief high level plan on how to implement
>>>> [11:51:26] <ecapriolo> Also get maven artifacts with correct depencency
>>>> scopes like provided etc
>>>> [11:51:40] <ecapriolo> Right now pulling a hive jar from maven is like
>>>> pulling in the world
>>>> [11:52:08] bvanhoy [~Adium@64.124.34.34] has joined #hive
>>>> 
>>>> 
>>>> On Thu, Aug 15, 2013 at 11:14 PM, Edward Capriolo <
>> edlinuxguru@gmail.com
>>>> wrote:
>>>> 
>>>>> I have opened https://issues.apache.org/jira/browse/HIVE-5107because
I
>>>>> am growing tired of how long hive's build take.
>>>>> 
>>>>> I have started playing with this by creating a simple multi-module
>>>>> project and copying stuff as I go. I have ported a minimal shims and
>>> common
>>>>> and I have all the tests in common almost running.
>>>>> 
>>>>> Q. This is going to be ugly hacky work for a while, I was thinking it
>>>>> should be a branch but it is just going to be a mess of moves and
>> copies
>>>>> etc. Not really something you can diff etc.
>>>>> 
>>>>> Is anyone else interested in working on this as well. If so I think we
>>>>> can just setup a github and I can arrange for anyone to have access to
>>> it.
>>>>> 
>>>>> Thanks,
>>>>> Edward
>>>>> 
>>>>> 
>>>>> On Wed, Aug 7, 2013 at 5:04 PM, Edward Capriolo <
>> edlinuxguru@gmail.com
>>>> wrote:
>>>>> 
>>>>>> "Some of the hard part was that some of the test classes are in the
>>> wrong
>>>>>> module that references classes in a later module."
>>>>>> 
>>>>>> I think the modules will have to be able to reference each other
in
>>> many
>>>>>> cases. Serde and QL are tightly coupled. QL is really too large and
>> we
>>>>>> should find a way to cut that up.
>>>>>> 
>>>>>> Part of this problem is the q.tests
>>>>>> 
>>>>>> I think one way to handle this is to only allow unit tests inside
the
>>>>>> module. I imagine running all the q tests would be done in a final
>>> module
>>>>>> hive-qtest. Or possibly two final modules
>>>>>> hive-qtest
>>>>>> hive-qtest-extra (tangential things like UDFS and input formats not
>>> core
>>>>>> to hive)
>>>>>> 
>>>>>> 
>>>>>> On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley <omalley@apache.org
>>>> wrote:
>>>>>> 
>>>>>>> On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swarnim@gmail.com <
>>>>>>> kulkarni.swarnim@gmail.com> wrote:
>>>>>>> 
>>>>>>>>> I'd like to propose we move towards Maven.
>>>>>>>> 
>>>>>>>> Big +1 on this. Most of the major apache projects(hadoop,
hbase,
>>> avro
>>>>>>> etc.)
>>>>>>>> are maven based.
>>>>>>>> 
>>>>>>> 
>>>>>>> A big +1 from me too. I actually took a pass at it a couple of
>> months
>>>>>>> ago.
>>>>>>> Some of the hard part was that some of the test classes are in
the
>>> wrong
>>>>>>> module that references classes in a later module. Obviously that
>>>>>>> prevents
>>>>>>> any kind of modular build.
>>>>>>> 
>>>>>>> As an additional plus to Maven is that Maven includes tools to
>> correct
>>>>>>> the
>>>>>>> project and module dependencies.
>>>>>>> 
>>>>>>> -- Owen
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message