www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Upayavira ...@odoko.co.uk>
Subject Re: Data question
Date Fri, 25 Oct 2013 03:51:33 GMT
And, all commits go to publicly archived commit lists, no? Meaning if
you engage with mailing lists, you get access to commit activity for
free.

The only missing part though is geography, knowing where in the world a
committer is, and with the advent of powerful web mail services, this
gets harder still, unless the committer publishes a FOAF file (I think
that's what it is called?)

Upayavira

On Thu, Oct 24, 2013, at 08:44 PM, janI wrote:
> On 24 October 2013 20:24, Santiago Gala <santiago.gala@gmail.com> wrote:
> 
> > On Thu, Oct 24, 2013 at 7:35 PM, Steven Lloyd Wilson <slwilson4@wisc.edu
> > >wrote:
> >
> > > Hi Jan,
> > >
> > > My thought was that #commits would give an idea of how active the
> > > developers in that country were, in order to distinguish between a
> > country
> > > with a handful of developers that periodically commit, and a country
> > with a
> > > handful of developers that happen to be extraordinarily active ones.
> > >
> >
> 
> As I tried to explain, you will get data that cannot be compared
> statistically. But of course it still gives an indication of activity
> level.
> 
> These data are not back-end data, the apache project repos are publicly
> available, making it is possible for you to extract the repo log data.
> You
> need to cross reference the log data with data from people.apache.org.
> 
> 
> > >
> > Note that:
> > * the commits are typically surrounded by technical discussion in the devel
> > list
> > * for each commit an email is sent to a public list.
> >
> > You can reasonably infer the numbers you are looking for just using the
> > public email archive plus an analysis of email aliases and domains...
> >
> > This is the approach I decided to take in my Master Thesis, mostly to avoid
> > depending of the effort of other people...
> >
> 
> This is a very good approach, when looking for activity levels, because
> it
> includes QA, documentation and all the items around the programming.
> 
> Just to be clear, to my best knowledge, we dont have much better data
> internally, and as PCtony wrote, we would need extremely good reasons to
> provide information, which committers have chosen not to make publicly
> available.
> 
> rgds
> jan I.
> 
> 
> >
> > Regards
> > Santiago
> >
> >
> > > What I'm trying to measure is the technical capability of the population,
> > > using open source activity (both in terms of development and use) as a
> > > proxy variable. I'm definitely open to suggestions of data that is
> > > available on the backend that might work better for this, but as Tony
> > > suggested, I made my best stab at what I thought would be good measures,
> > > and something that is likely to exist in your data.
> > >
> > > Best,
> > > Steven
> > >
> > >
> > >
> > > >On 23 October 2013 03:14, Steven Lloyd Wilson <slwilson4@wisc.edu>
> > wrote:
> > > >
> > > >> Thanks for the quick reply Tony.
> > > >>
> > > >> I certainly appreciate the need for keeping the personal data private,
> > > and
> > > >> have no interest in collecting data at that level. I'm looking for
> > > country
> > > >> level data, ideally year-by-year.
> > > >>
> > > >> So as a starting point, an output like this would be ideal: country,
> > > year,
> > > >> # of developers, # of total commits. For example:
> > > >> Mexico, 2009, 105, 5213
> > > >> Mexico, 2010, 117, 5598
> > > >>
> > > >
> > > >HI
> > > >just out of curiosity, why do you think #commits is a significant value
> > ?
> > > >
> > > >I tried to make a "top 10", for one of the bigger projects
> > > >(ApacheOpenOffice), and it turned out that a couple of the most active
> > > >active committers  did not even reach "top 10". Reason was that these
> > > >committers had few commits, but each commit contained with a lot of
> > files,
> > > >where as some of the web committers tended to do a commit for every
> > file.
> > > >Btw extracting the data from svn was very network demanding.
> > > >
> > > >rgds
> > > >jan I.
> > > >
> > > >
> > > >
> > > >
> > > >> et cetera.
> > > >>
> > > >> Would that be something that would be easily extractable from the
> > > backend?
> > > >>
> > > >> Steven
> > > >>
> > > >>
> > > >> On 10/22/2013 02:13 AM, Tony Stevenson wrote:
> > > >>
> > > >>> Steven,
> > > >>>
> > > >>> Some of this information won't be so easy to get.  For example
we
> > > cannot
> > > >>> tell you how many downloads each project has had, as almost all
of
> > that
> > > >>> data is held locally by the mirrors and we don't currently collect
> > it.
> > > >>>
> > > >>> Other data is a little easier to collect, but I'm afraid some
of it
> > is
> > > >>> likely considered personal data, so we'd almost certainly not
release
> > > it to
> > > >>> a 3rd party. This is mostly because the data you have already
found
> > is
> > > >>> constructed from other data, which is interspersed with some personal
> > > data.
> > > >>> A lot of the data will unfortunately be stored within the SVN
> > history.
> > > >>> However I suspect a lot of it will not be contained within the
public
> > > repo,
> > > >>> though clearly some of it will be.
> > > >>>
> > > >>> The only way I can be more helpful to you, I think, is to ask
you to
> > > give
> > > >>> us some specific requests for data and I can let you know if we
can
> > > either
> > > >>> get that data, and if we are able to distribute it.  I realise
this
> > > may not
> > > >>> be as helpful as you want, but we are prudent about releasing
data.
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On 22 Oct 2013, at 03:03, Steven Lloyd Wilson <slwilson4@wisc.edu>
> > > wrote:
> > > >>>
> > > >>>  Hello,
> > > >>>>
> > > >>>> I first emailed the media contact with Apache and he recommended
> > that
> > > I
> > > >>>> resend to this list, with a somewhat unorthodox request for
> > > >>>> information/direction.
> > > >>>>
> > > >>>> I'm a PhD student writing a dissertation on the effects of
the
> > > Internet
> > > >>>> on politics around the world. One of the variables that I'm
looking
> > > at is
> > > >>>> how technically literate the populations of different countries
are.
> > > The
> > > >>>> way I'm measuring this is through a variety of sources getting
at
> > open
> > > >>>> source downloads, usage of open source software, etc.
> > > >>>>
> > > >>>> I've found some excellent information on Apache's site, including
> > the
> > > >>>> map of where contributors are located, so I think that somewhere
> > > behind the
> > > >>>> scenes should be the specific data that I'm looking for: the
numbers
> > > of
> > > >>>> download for each project, number of contributors, number
of
> > mirrors,
> > > etc.
> > > >>>> by year and country, since the start of the Apache Foundation.
> > > >>>>
> > > >>>> Could you point me in the right direction on this matter?
> > > >>>>
> > > >>>> Thanks!
> > > >>>> Steven Wilson
> > > >>>> PhD Candidate in Political Science
> > > >>>> University of Wisconsin-Madison
> > > >>>>
> > > >>>
> > > >>> Cheers,
> > > >>> Tony
> > > >>>
> > > >>> ------------------------------****----
> > >
> > > >>> Tony Stevenson
> > > >>>
> > > >>> tony@pc-tony.com
> > > >>> pctony@apache.org
> > > >>>
> > > >>> http://www.pc-tony.com
> > > >>>
> > > >>> GPG - 1024D/51047D66
> > > >>> ------------------------------****----
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>
> > >
> >

Mime
View raw message