www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Lloyd Wilson <slwils...@wisc.edu>
Subject Re: Data question
Date Thu, 24 Oct 2013 17:35:32 GMT
Hi Jan,

My thought was that #commits would give an idea of how active the 
developers in that country were, in order to distinguish between a 
country with a handful of developers that periodically commit, and a 
country with a handful of developers that happen to be extraordinarily 
active ones.

What I'm trying to measure is the technical capability of the 
population, using open source activity (both in terms of development and 
use) as a proxy variable. I'm definitely open to suggestions of data 
that is available on the backend that might work better for this, but as 
Tony suggested, I made my best stab at what I thought would be good 
measures, and something that is likely to exist in your data.

Best,
Steven


 >On 23 October 2013 03:14, Steven Lloyd Wilson <slwilson4@wisc.edu> wrote:
 >
 >> Thanks for the quick reply Tony.
 >>
 >> I certainly appreciate the need for keeping the personal data 
private, and
 >> have no interest in collecting data at that level. I'm looking for 
country
 >> level data, ideally year-by-year.
 >>
 >> So as a starting point, an output like this would be ideal: country, 
year,
 >> # of developers, # of total commits. For example:
 >> Mexico, 2009, 105, 5213
 >> Mexico, 2010, 117, 5598
 >>
 >
 >HI
 >just out of curiosity, why do you think #commits is a significant value ?
 >
 >I tried to make a "top 10", for one of the bigger projects
 >(ApacheOpenOffice), and it turned out that a couple of the most active
 >active committers  did not even reach "top 10". Reason was that these
 >committers had few commits, but each commit contained with a lot of files,
 >where as some of the web committers tended to do a commit for every file.
 >Btw extracting the data from svn was very network demanding.
 >
 >rgds
 >jan I.
 >
 >
 >
 >
 >> et cetera.
 >>
 >> Would that be something that would be easily extractable from the 
backend?
 >>
 >> Steven
 >>
 >>
 >> On 10/22/2013 02:13 AM, Tony Stevenson wrote:
 >>
 >>> Steven,
 >>>
 >>> Some of this information won't be so easy to get.  For example we 
cannot
 >>> tell you how many downloads each project has had, as almost all of that
 >>> data is held locally by the mirrors and we don't currently collect it.
 >>>
 >>> Other data is a little easier to collect, but I'm afraid some of it is
 >>> likely considered personal data, so we'd almost certainly not 
release it to
 >>> a 3rd party. This is mostly because the data you have already found is
 >>> constructed from other data, which is interspersed with some 
personal data.
 >>> A lot of the data will unfortunately be stored within the SVN history.
 >>> However I suspect a lot of it will not be contained within the 
public repo,
 >>> though clearly some of it will be.
 >>>
 >>> The only way I can be more helpful to you, I think, is to ask you 
to give
 >>> us some specific requests for data and I can let you know if we can 
either
 >>> get that data, and if we are able to distribute it.  I realise this 
may not
 >>> be as helpful as you want, but we are prudent about releasing data.
 >>>
 >>>
 >>>
 >>>
 >>> On 22 Oct 2013, at 03:03, Steven Lloyd Wilson <slwilson4@wisc.edu> 
wrote:
 >>>
 >>>  Hello,
 >>>>
 >>>> I first emailed the media contact with Apache and he recommended 
that I
 >>>> resend to this list, with a somewhat unorthodox request for
 >>>> information/direction.
 >>>>
 >>>> I'm a PhD student writing a dissertation on the effects of the 
Internet
 >>>> on politics around the world. One of the variables that I'm 
looking at is
 >>>> how technically literate the populations of different countries 
are. The
 >>>> way I'm measuring this is through a variety of sources getting at open
 >>>> source downloads, usage of open source software, etc.
 >>>>
 >>>> I've found some excellent information on Apache's site, including the
 >>>> map of where contributors are located, so I think that somewhere 
behind the
 >>>> scenes should be the specific data that I'm looking for: the 
numbers of
 >>>> download for each project, number of contributors, number of 
mirrors, etc.
 >>>> by year and country, since the start of the Apache Foundation.
 >>>>
 >>>> Could you point me in the right direction on this matter?
 >>>>
 >>>> Thanks!
 >>>> Steven Wilson
 >>>> PhD Candidate in Political Science
 >>>> University of Wisconsin-Madison
 >>>>
 >>>
 >>> Cheers,
 >>> Tony
 >>>
 >>> ------------------------------**----
 >>> Tony Stevenson
 >>>
 >>> tony@pc-tony.com
 >>> pctony@apache.org
 >>>
 >>> http://www.pc-tony.com
 >>>
 >>> GPG - 1024D/51047D66
 >>> ------------------------------**----
 >>>
 >>>
 >>>
 >>>
 >>>
 >>>
 >>>
 >>>
 >>

Mime
View raw message