hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Boudnik <...@apache.org>
Subject Re: Interesting claims that seem untrue
Date Tue, 17 Sep 2013 17:57:15 GMT
Carter,

what you are doing is essentially contradict ASF policy of "community over
code".

Perhaps, your intentions are good. However, LOC calculations or other silly
contests are essentially driving a wedge between developers who happen to draw
their paycheck from different commercial entities. Hadoop community passed
through this already and it caused nothing but despair and bitterness between
the people.

Unlike some other popular contests, the number of lines contributed doesn't
matter for most. Seriously.

Regards,
  Cos

On Mon, Sep 16, 2013 at 01:58PM, Carter Shanklin wrote:
> Ed,
> 
> If nothing else I'm glad it was interesting enough to generate some
> discussion. These sorts of stats are always subjects of a lot of
> controversy. I have seen a lot of these sorts of charts float around in
> confidential slide decks and I think it's good to have them out in the open
> where anyone can critique and correct them.
> 
> In this case Ed, you've pointed out a legitimate flaw in my analysis. Doing
> the analysis again I found that previously, due to a bug in my scripts,
> JIRAs that didn't have Hudson comments in them were not counted (this was
> one way it was identifying SVN commit IDs which I have since removed due to
> flakiness). Brock's patch was the single largest victim of this bug but not
> the only one, there were some from Cloudera, NexR, Hortonworks, Facebook
> even 2 from you Ed. The interested can see a full list of exclusions here:
> https://docs.google.com/spreadsheet/ccc?key=0ArmXd5zzNQm5dDJTMkFtaUk2d0dyU3hnWGJCcUczbXc#gid=0.
> I apologize to those under-represented, there wasn't any intent on my part
> to minimize anyone's work. The impact in final totals is Cloudera +5.4%,
> NexR +0.8%, Facebook -2.7%, Hortonworks -3.3%. I will be updating the blog
> later today with relevant corrections.
> 
> There is going to be continued interest in seeing charts like these, for
> example when Hive 12 is officially done. Sanjay suggested that LoC counts
> may not be the best way to represent true contribution. I agree that not
> all lines of code are created equal, for example a few monster patches
> recently went in re-arranging HCatalog namespaces and I think also
> indentation style. This (hopefully) mechanical work is not on the same
> footing as adding new query language features. Still it is work and
> wouldn't be fair to pretend it didn't happen. If anyone has ideas on better
> ways to fairly capture contribution I'm open to suggestions.
> 
> 
> 
> On Thu, Sep 12, 2013 at 7:19 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
> 
> > I was reading the horton-works blog and found an interesting article.
> >
> > http://hortonworks.com/blog/stinger-phase-2-the-journey-to-100x-faster-hive/#comment-160753
> >
> > There is a very interesting graphic which attempts to demonstrate lines of
> > code in the 12 release.
> > http://hortonworks.com/wp-content/uploads/2013/09/hive4.png
> >
> > Although I do not know how they are calculated, they are probably counting
> > code generated by tests output, but besides that they are wrong.
> >
> > One claim is that Cloudera contributed 4,244 lines of code.
> >
> > So to debunk that claim:
> >
> > In https://issues.apache.org/jira/browse/HIVE-4675 Brock Noland from
> > cloudera, created the ptest2 testing framework. He did all the work for
> > ptest2 in hive 12, and it is clearly more then 4,244
> >
> > This consists of 84 java files
> > [edward@desksandra ptest2]$ find . -name "*.java" | wc -l
> > 84
> > and by itself is 8001 lines of code.
> > [edward@desksandra ptest2]$ find . -name "*.java" | xargs cat | wc -l
> > 8001
> >
> > [edward@desksandra hive-trunk]$ wc -l HIVE-4675.patch
> > 7902 HIVE-4675.patch
> >
> > This is not the only feature from cloudera in hive 12.
> >
> > There is also a section of the article that talks of a "ROAD MAP" for hive
> > features. I did not know we (hive) had a road map. I have advocated
> > switching to feature based release and having a road map before, but it was
> > suggested that might limit people from itch-scratching.
> >
> >
> >
> >
> >
> 
> 
> -- 
> Carter Shanklin
> Director, Product Management
> Hortonworks
> (M): +1.650.644.8795 (T): @cshanklin <http://twitter.com/cshanklin>
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.

Mime
View raw message