www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From janI <j...@apache.org>
Subject Re: Data question
Date Wed, 23 Oct 2013 06:32:02 GMT
On 23 October 2013 03:14, Steven Lloyd Wilson <slwilson4@wisc.edu> wrote:

> Thanks for the quick reply Tony.
>
> I certainly appreciate the need for keeping the personal data private, and
> have no interest in collecting data at that level. I'm looking for country
> level data, ideally year-by-year.
>
> So as a starting point, an output like this would be ideal: country, year,
> # of developers, # of total commits. For example:
> Mexico, 2009, 105, 5213
> Mexico, 2010, 117, 5598
>

HI
just out of curiosity, why do you think #commits is a significant value ?

I tried to make a "top 10", for one of the bigger projects
(ApacheOpenOffice), and it turned out that a couple of the most active
active committers  did not even reach "top 10". Reason was that these
committers had few commits, but each commit contained with a lot of files,
where as some of the web committers tended to do a commit for every file.
Btw extracting the data from svn was very network demanding.

rgds
jan I.




> et cetera.
>
> Would that be something that would be easily extractable from the backend?
>
> Steven
>
>
> On 10/22/2013 02:13 AM, Tony Stevenson wrote:
>
>> Steven,
>>
>> Some of this information won't be so easy to get.  For example we cannot
>> tell you how many downloads each project has had, as almost all of that
>> data is held locally by the mirrors and we don't currently collect it.
>>
>> Other data is a little easier to collect, but I'm afraid some of it is
>> likely considered personal data, so we'd almost certainly not release it to
>> a 3rd party. This is mostly because the data you have already found is
>> constructed from other data, which is interspersed with some personal data.
>> A lot of the data will unfortunately be stored within the SVN history.
>> However I suspect a lot of it will not be contained within the public repo,
>> though clearly some of it will be.
>>
>> The only way I can be more helpful to you, I think, is to ask you to give
>> us some specific requests for data and I can let you know if we can either
>> get that data, and if we are able to distribute it.  I realise this may not
>> be as helpful as you want, but we are prudent about releasing data.
>>
>>
>>
>>
>> On 22 Oct 2013, at 03:03, Steven Lloyd Wilson <slwilson4@wisc.edu> wrote:
>>
>>  Hello,
>>>
>>> I first emailed the media contact with Apache and he recommended that I
>>> resend to this list, with a somewhat unorthodox request for
>>> information/direction.
>>>
>>> I'm a PhD student writing a dissertation on the effects of the Internet
>>> on politics around the world. One of the variables that I'm looking at is
>>> how technically literate the populations of different countries are. The
>>> way I'm measuring this is through a variety of sources getting at open
>>> source downloads, usage of open source software, etc.
>>>
>>> I've found some excellent information on Apache's site, including the
>>> map of where contributors are located, so I think that somewhere behind the
>>> scenes should be the specific data that I'm looking for: the numbers of
>>> download for each project, number of contributors, number of mirrors, etc.
>>> by year and country, since the start of the Apache Foundation.
>>>
>>> Could you point me in the right direction on this matter?
>>>
>>> Thanks!
>>> Steven Wilson
>>> PhD Candidate in Political Science
>>> University of Wisconsin-Madison
>>>
>>
>> Cheers,
>> Tony
>>
>> ------------------------------**----
>> Tony Stevenson
>>
>> tony@pc-tony.com
>> pctony@apache.org
>>
>> http://www.pc-tony.com
>>
>> GPG - 1024D/51047D66
>> ------------------------------**----
>>
>>
>>
>>
>>
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message