www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From janI <j...@apache.org>
Subject Re: Data question
Date Fri, 25 Oct 2013 06:32:33 GMT
On 25 October 2013 05:51, Upayavira <uv@odoko.co.uk> wrote:

> And, all commits go to publicly archived commit lists, no? Meaning if
> you engage with mailing lists, you get access to commit activity for
> free.
>
yes, thats the (for our bandwidth) free way of doing this, another is to
"svn log" with a couple of options on the top of the repos.


>
> The only missing part though is geography, knowing where in the world a
> committer is, and with the advent of powerful web mail services, this
> gets harder still, unless the committer publishes a FOAF file (I think
> that's what it is called?)
>

thats the name, and part of it (the part you need) is available on
people.a.o, f.x. take the google map (google produces a file with all the
markers) or go through http://people.apache.org/committers.html

both ways will work, but you need to do the data mining, the data wont be
served on a silver plate :-)

rgds
jan I

>
> Upayavira
>
> On Thu, Oct 24, 2013, at 08:44 PM, janI wrote:
> > On 24 October 2013 20:24, Santiago Gala <santiago.gala@gmail.com> wrote:
> >
> > > On Thu, Oct 24, 2013 at 7:35 PM, Steven Lloyd Wilson <
> slwilson4@wisc.edu
> > > >wrote:
> > >
> > > > Hi Jan,
> > > >
> > > > My thought was that #commits would give an idea of how active the
> > > > developers in that country were, in order to distinguish between a
> > > country
> > > > with a handful of developers that periodically commit, and a country
> > > with a
> > > > handful of developers that happen to be extraordinarily active ones.
> > > >
> > >
> >
> > As I tried to explain, you will get data that cannot be compared
> > statistically. But of course it still gives an indication of activity
> > level.
> >
> > These data are not back-end data, the apache project repos are publicly
> > available, making it is possible for you to extract the repo log data.
> > You
> > need to cross reference the log data with data from people.apache.org.
> >
> >
> > > >
> > > Note that:
> > > * the commits are typically surrounded by technical discussion in the
> devel
> > > list
> > > * for each commit an email is sent to a public list.
> > >
> > > You can reasonably infer the numbers you are looking for just using the
> > > public email archive plus an analysis of email aliases and domains...
> > >
> > > This is the approach I decided to take in my Master Thesis, mostly to
> avoid
> > > depending of the effort of other people...
> > >
> >
> > This is a very good approach, when looking for activity levels, because
> > it
> > includes QA, documentation and all the items around the programming.
> >
> > Just to be clear, to my best knowledge, we dont have much better data
> > internally, and as PCtony wrote, we would need extremely good reasons to
> > provide information, which committers have chosen not to make publicly
> > available.
> >
> > rgds
> > jan I.
> >
> >
> > >
> > > Regards
> > > Santiago
> > >
> > >
> > > > What I'm trying to measure is the technical capability of the
> population,
> > > > using open source activity (both in terms of development and use) as
> a
> > > > proxy variable. I'm definitely open to suggestions of data that is
> > > > available on the backend that might work better for this, but as Tony
> > > > suggested, I made my best stab at what I thought would be good
> measures,
> > > > and something that is likely to exist in your data.
> > > >
> > > > Best,
> > > > Steven
> > > >
> > > >
> > > >
> > > > >On 23 October 2013 03:14, Steven Lloyd Wilson <slwilson4@wisc.edu>
> > > wrote:
> > > > >
> > > > >> Thanks for the quick reply Tony.
> > > > >>
> > > > >> I certainly appreciate the need for keeping the personal data
> private,
> > > > and
> > > > >> have no interest in collecting data at that level. I'm looking
for
> > > > country
> > > > >> level data, ideally year-by-year.
> > > > >>
> > > > >> So as a starting point, an output like this would be ideal:
> country,
> > > > year,
> > > > >> # of developers, # of total commits. For example:
> > > > >> Mexico, 2009, 105, 5213
> > > > >> Mexico, 2010, 117, 5598
> > > > >>
> > > > >
> > > > >HI
> > > > >just out of curiosity, why do you think #commits is a significant
> value
> > > ?
> > > > >
> > > > >I tried to make a "top 10", for one of the bigger projects
> > > > >(ApacheOpenOffice), and it turned out that a couple of the most
> active
> > > > >active committers  did not even reach "top 10". Reason was that
> these
> > > > >committers had few commits, but each commit contained with a lot of
> > > files,
> > > > >where as some of the web committers tended to do a commit for every
> > > file.
> > > > >Btw extracting the data from svn was very network demanding.
> > > > >
> > > > >rgds
> > > > >jan I.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >> et cetera.
> > > > >>
> > > > >> Would that be something that would be easily extractable from
the
> > > > backend?
> > > > >>
> > > > >> Steven
> > > > >>
> > > > >>
> > > > >> On 10/22/2013 02:13 AM, Tony Stevenson wrote:
> > > > >>
> > > > >>> Steven,
> > > > >>>
> > > > >>> Some of this information won't be so easy to get.  For example
we
> > > > cannot
> > > > >>> tell you how many downloads each project has had, as almost
all
> of
> > > that
> > > > >>> data is held locally by the mirrors and we don't currently
> collect
> > > it.
> > > > >>>
> > > > >>> Other data is a little easier to collect, but I'm afraid
some of
> it
> > > is
> > > > >>> likely considered personal data, so we'd almost certainly
not
> release
> > > > it to
> > > > >>> a 3rd party. This is mostly because the data you have already
> found
> > > is
> > > > >>> constructed from other data, which is interspersed with some
> personal
> > > > data.
> > > > >>> A lot of the data will unfortunately be stored within the
SVN
> > > history.
> > > > >>> However I suspect a lot of it will not be contained within
the
> public
> > > > repo,
> > > > >>> though clearly some of it will be.
> > > > >>>
> > > > >>> The only way I can be more helpful to you, I think, is to
ask
> you to
> > > > give
> > > > >>> us some specific requests for data and I can let you know
if we
> can
> > > > either
> > > > >>> get that data, and if we are able to distribute it.  I realise
> this
> > > > may not
> > > > >>> be as helpful as you want, but we are prudent about releasing
> data.
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> On 22 Oct 2013, at 03:03, Steven Lloyd Wilson <
> slwilson4@wisc.edu>
> > > > wrote:
> > > > >>>
> > > > >>>  Hello,
> > > > >>>>
> > > > >>>> I first emailed the media contact with Apache and he
recommended
> > > that
> > > > I
> > > > >>>> resend to this list, with a somewhat unorthodox request
for
> > > > >>>> information/direction.
> > > > >>>>
> > > > >>>> I'm a PhD student writing a dissertation on the effects
of the
> > > > Internet
> > > > >>>> on politics around the world. One of the variables that
I'm
> looking
> > > > at is
> > > > >>>> how technically literate the populations of different
countries
> are.
> > > > The
> > > > >>>> way I'm measuring this is through a variety of sources
getting
> at
> > > open
> > > > >>>> source downloads, usage of open source software, etc.
> > > > >>>>
> > > > >>>> I've found some excellent information on Apache's site,
> including
> > > the
> > > > >>>> map of where contributors are located, so I think that
somewhere
> > > > behind the
> > > > >>>> scenes should be the specific data that I'm looking for:
the
> numbers
> > > > of
> > > > >>>> download for each project, number of contributors, number
of
> > > mirrors,
> > > > etc.
> > > > >>>> by year and country, since the start of the Apache Foundation.
> > > > >>>>
> > > > >>>> Could you point me in the right direction on this matter?
> > > > >>>>
> > > > >>>> Thanks!
> > > > >>>> Steven Wilson
> > > > >>>> PhD Candidate in Political Science
> > > > >>>> University of Wisconsin-Madison
> > > > >>>>
> > > > >>>
> > > > >>> Cheers,
> > > > >>> Tony
> > > > >>>
> > > > >>> ------------------------------****----
> > > >
> > > > >>> Tony Stevenson
> > > > >>>
> > > > >>> tony@pc-tony.com
> > > > >>> pctony@apache.org
> > > > >>>
> > > > >>> http://www.pc-tony.com
> > > > >>>
> > > > >>> GPG - 1024D/51047D66
> > > > >>> ------------------------------****----
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message