Return-Path: X-Original-To: apmail-infrastructure-dev-archive@minotaur.apache.org Delivered-To: apmail-infrastructure-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 12C1F10D63 for ; Thu, 24 Oct 2013 18:25:27 +0000 (UTC) Received: (qmail 73934 invoked by uid 500); 24 Oct 2013 18:25:14 -0000 Delivered-To: apmail-infrastructure-dev-archive@apache.org Received: (qmail 73844 invoked by uid 500); 24 Oct 2013 18:25:09 -0000 Mailing-List: contact infrastructure-dev-help@apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: infrastructure-dev@apache.org Delivered-To: mailing list infrastructure-dev@apache.org Received: (qmail 73834 invoked by uid 99); 24 Oct 2013 18:25:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Oct 2013 18:25:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of santiago.gala@gmail.com designates 209.85.214.169 as permitted sender) Received: from [209.85.214.169] (HELO mail-ob0-f169.google.com) (209.85.214.169) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Oct 2013 18:25:01 +0000 Received: by mail-ob0-f169.google.com with SMTP id uz6so2193121obc.14 for ; Thu, 24 Oct 2013 11:24:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Q+ba9fk/Q+unMscWtv59CEHBNz9rxUImyJO2KibEuHk=; b=G54Pc+cxXKbjToAYPIDcTAVIQmZ7QUt9BEgBWeZDf5H3cNgvum03/iDsKn8hEcuT25 hZJPSr9/+fUrj9JKwDcch1XH0cwlm8okAUJ3v82ZH4FaXzb6jd/N70G1pXXvHWwcg0Z8 4IH3FiYXY0PDyCWiPBaCqBNAQPVx/FYbH5Q6g31YRZr7zSc/E0vPo16x0cIQLJPPynOB DS0ZRI12E4V0+YiRYBbSz/OcBqYtenVtgeDe0I8MUguKh1pMBY5M9cg19l2njBiYlJDj dzObvS48hjh3cdqd7sobGzXxDCTlpK6xRqHBliK9JS18OZI2tkfaEu3iXLlx2GVhStT7 uRMg== MIME-Version: 1.0 X-Received: by 10.182.22.18 with SMTP id z18mr3122951obe.42.1382639080038; Thu, 24 Oct 2013 11:24:40 -0700 (PDT) Received: by 10.76.23.167 with HTTP; Thu, 24 Oct 2013 11:24:39 -0700 (PDT) In-Reply-To: <52695A64.1090204@wisc.edu> References: <5265DCE4.9070908@wisc.edu> <3AA1DC6F-477F-48DD-8C5B-56744FE28888@pc-tony.com> <52695A64.1090204@wisc.edu> Date: Thu, 24 Oct 2013 20:24:39 +0200 Message-ID: Subject: Re: Data question From: Santiago Gala To: infrastructure-dev Content-Type: multipart/alternative; boundary=001a11332d1637e99d04e980bfae X-Virus-Checked: Checked by ClamAV on apache.org --001a11332d1637e99d04e980bfae Content-Type: text/plain; charset=UTF-8 On Thu, Oct 24, 2013 at 7:35 PM, Steven Lloyd Wilson wrote: > Hi Jan, > > My thought was that #commits would give an idea of how active the > developers in that country were, in order to distinguish between a country > with a handful of developers that periodically commit, and a country with a > handful of developers that happen to be extraordinarily active ones. > > Note that: * the commits are typically surrounded by technical discussion in the devel list * for each commit an email is sent to a public list. You can reasonably infer the numbers you are looking for just using the public email archive plus an analysis of email aliases and domains... This is the approach I decided to take in my Master Thesis, mostly to avoid depending of the effort of other people... Regards Santiago > What I'm trying to measure is the technical capability of the population, > using open source activity (both in terms of development and use) as a > proxy variable. I'm definitely open to suggestions of data that is > available on the backend that might work better for this, but as Tony > suggested, I made my best stab at what I thought would be good measures, > and something that is likely to exist in your data. > > Best, > Steven > > > > >On 23 October 2013 03:14, Steven Lloyd Wilson wrote: > > > >> Thanks for the quick reply Tony. > >> > >> I certainly appreciate the need for keeping the personal data private, > and > >> have no interest in collecting data at that level. I'm looking for > country > >> level data, ideally year-by-year. > >> > >> So as a starting point, an output like this would be ideal: country, > year, > >> # of developers, # of total commits. For example: > >> Mexico, 2009, 105, 5213 > >> Mexico, 2010, 117, 5598 > >> > > > >HI > >just out of curiosity, why do you think #commits is a significant value ? > > > >I tried to make a "top 10", for one of the bigger projects > >(ApacheOpenOffice), and it turned out that a couple of the most active > >active committers did not even reach "top 10". Reason was that these > >committers had few commits, but each commit contained with a lot of files, > >where as some of the web committers tended to do a commit for every file. > >Btw extracting the data from svn was very network demanding. > > > >rgds > >jan I. > > > > > > > > > >> et cetera. > >> > >> Would that be something that would be easily extractable from the > backend? > >> > >> Steven > >> > >> > >> On 10/22/2013 02:13 AM, Tony Stevenson wrote: > >> > >>> Steven, > >>> > >>> Some of this information won't be so easy to get. For example we > cannot > >>> tell you how many downloads each project has had, as almost all of that > >>> data is held locally by the mirrors and we don't currently collect it. > >>> > >>> Other data is a little easier to collect, but I'm afraid some of it is > >>> likely considered personal data, so we'd almost certainly not release > it to > >>> a 3rd party. This is mostly because the data you have already found is > >>> constructed from other data, which is interspersed with some personal > data. > >>> A lot of the data will unfortunately be stored within the SVN history. > >>> However I suspect a lot of it will not be contained within the public > repo, > >>> though clearly some of it will be. > >>> > >>> The only way I can be more helpful to you, I think, is to ask you to > give > >>> us some specific requests for data and I can let you know if we can > either > >>> get that data, and if we are able to distribute it. I realise this > may not > >>> be as helpful as you want, but we are prudent about releasing data. > >>> > >>> > >>> > >>> > >>> On 22 Oct 2013, at 03:03, Steven Lloyd Wilson > wrote: > >>> > >>> Hello, > >>>> > >>>> I first emailed the media contact with Apache and he recommended that > I > >>>> resend to this list, with a somewhat unorthodox request for > >>>> information/direction. > >>>> > >>>> I'm a PhD student writing a dissertation on the effects of the > Internet > >>>> on politics around the world. One of the variables that I'm looking > at is > >>>> how technically literate the populations of different countries are. > The > >>>> way I'm measuring this is through a variety of sources getting at open > >>>> source downloads, usage of open source software, etc. > >>>> > >>>> I've found some excellent information on Apache's site, including the > >>>> map of where contributors are located, so I think that somewhere > behind the > >>>> scenes should be the specific data that I'm looking for: the numbers > of > >>>> download for each project, number of contributors, number of mirrors, > etc. > >>>> by year and country, since the start of the Apache Foundation. > >>>> > >>>> Could you point me in the right direction on this matter? > >>>> > >>>> Thanks! > >>>> Steven Wilson > >>>> PhD Candidate in Political Science > >>>> University of Wisconsin-Madison > >>>> > >>> > >>> Cheers, > >>> Tony > >>> > >>> ------------------------------****---- > > >>> Tony Stevenson > >>> > >>> tony@pc-tony.com > >>> pctony@apache.org > >>> > >>> http://www.pc-tony.com > >>> > >>> GPG - 1024D/51047D66 > >>> ------------------------------****---- > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >> > --001a11332d1637e99d04e980bfae--