www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santiago Gala <santiago.g...@gmail.com>
Subject Re: Data question
Date Thu, 24 Oct 2013 18:24:39 GMT
On Thu, Oct 24, 2013 at 7:35 PM, Steven Lloyd Wilson <slwilson4@wisc.edu>wrote:

> Hi Jan,
>
> My thought was that #commits would give an idea of how active the
> developers in that country were, in order to distinguish between a country
> with a handful of developers that periodically commit, and a country with a
> handful of developers that happen to be extraordinarily active ones.
>
>
Note that:
* the commits are typically surrounded by technical discussion in the devel
list
* for each commit an email is sent to a public list.

You can reasonably infer the numbers you are looking for just using the
public email archive plus an analysis of email aliases and domains...

This is the approach I decided to take in my Master Thesis, mostly to avoid
depending of the effort of other people...

Regards
Santiago


> What I'm trying to measure is the technical capability of the population,
> using open source activity (both in terms of development and use) as a
> proxy variable. I'm definitely open to suggestions of data that is
> available on the backend that might work better for this, but as Tony
> suggested, I made my best stab at what I thought would be good measures,
> and something that is likely to exist in your data.
>
> Best,
> Steven
>
>
>
> >On 23 October 2013 03:14, Steven Lloyd Wilson <slwilson4@wisc.edu> wrote:
> >
> >> Thanks for the quick reply Tony.
> >>
> >> I certainly appreciate the need for keeping the personal data private,
> and
> >> have no interest in collecting data at that level. I'm looking for
> country
> >> level data, ideally year-by-year.
> >>
> >> So as a starting point, an output like this would be ideal: country,
> year,
> >> # of developers, # of total commits. For example:
> >> Mexico, 2009, 105, 5213
> >> Mexico, 2010, 117, 5598
> >>
> >
> >HI
> >just out of curiosity, why do you think #commits is a significant value ?
> >
> >I tried to make a "top 10", for one of the bigger projects
> >(ApacheOpenOffice), and it turned out that a couple of the most active
> >active committers  did not even reach "top 10". Reason was that these
> >committers had few commits, but each commit contained with a lot of files,
> >where as some of the web committers tended to do a commit for every file.
> >Btw extracting the data from svn was very network demanding.
> >
> >rgds
> >jan I.
> >
> >
> >
> >
> >> et cetera.
> >>
> >> Would that be something that would be easily extractable from the
> backend?
> >>
> >> Steven
> >>
> >>
> >> On 10/22/2013 02:13 AM, Tony Stevenson wrote:
> >>
> >>> Steven,
> >>>
> >>> Some of this information won't be so easy to get.  For example we
> cannot
> >>> tell you how many downloads each project has had, as almost all of that
> >>> data is held locally by the mirrors and we don't currently collect it.
> >>>
> >>> Other data is a little easier to collect, but I'm afraid some of it is
> >>> likely considered personal data, so we'd almost certainly not release
> it to
> >>> a 3rd party. This is mostly because the data you have already found is
> >>> constructed from other data, which is interspersed with some personal
> data.
> >>> A lot of the data will unfortunately be stored within the SVN history.
> >>> However I suspect a lot of it will not be contained within the public
> repo,
> >>> though clearly some of it will be.
> >>>
> >>> The only way I can be more helpful to you, I think, is to ask you to
> give
> >>> us some specific requests for data and I can let you know if we can
> either
> >>> get that data, and if we are able to distribute it.  I realise this
> may not
> >>> be as helpful as you want, but we are prudent about releasing data.
> >>>
> >>>
> >>>
> >>>
> >>> On 22 Oct 2013, at 03:03, Steven Lloyd Wilson <slwilson4@wisc.edu>
> wrote:
> >>>
> >>>  Hello,
> >>>>
> >>>> I first emailed the media contact with Apache and he recommended that
> I
> >>>> resend to this list, with a somewhat unorthodox request for
> >>>> information/direction.
> >>>>
> >>>> I'm a PhD student writing a dissertation on the effects of the
> Internet
> >>>> on politics around the world. One of the variables that I'm looking
> at is
> >>>> how technically literate the populations of different countries are.
> The
> >>>> way I'm measuring this is through a variety of sources getting at open
> >>>> source downloads, usage of open source software, etc.
> >>>>
> >>>> I've found some excellent information on Apache's site, including the
> >>>> map of where contributors are located, so I think that somewhere
> behind the
> >>>> scenes should be the specific data that I'm looking for: the numbers
> of
> >>>> download for each project, number of contributors, number of mirrors,
> etc.
> >>>> by year and country, since the start of the Apache Foundation.
> >>>>
> >>>> Could you point me in the right direction on this matter?
> >>>>
> >>>> Thanks!
> >>>> Steven Wilson
> >>>> PhD Candidate in Political Science
> >>>> University of Wisconsin-Madison
> >>>>
> >>>
> >>> Cheers,
> >>> Tony
> >>>
> >>> ------------------------------****----
>
> >>> Tony Stevenson
> >>>
> >>> tony@pc-tony.com
> >>> pctony@apache.org
> >>>
> >>> http://www.pc-tony.com
> >>>
> >>> GPG - 1024D/51047D66
> >>> ------------------------------****----
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message