www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Canonical sources for information
Date Thu, 30 May 2013 09:09:59 GMT
Hi,

On Wed, May 29, 2013 at 3:36 PM, Alan D. Cabrera <adc@toolazydogs.com> wrote:
> Previously there was talk about Sam also providing a REST API that didn't require a
> username/password, and this API would return a smaller subset of publicly safe
> information.

FWIW, I spent a few minutes hacking together a script that scrapes
data from http://people.apache.org/committer-index.html and turns it
into the kind of JSON you outlined earlier. A static snapshot is
available under http://zitting.name/2013/05/api/v1/:

    $ curl http://zitting.name/2013/05/api/v1/committers
    {"committers":["a_horuzhenko","aadamchik","aadomowski", ... ]}

    $ curl http://zitting.name/2013/05/api/v1/committers/adc
    {"fullName":"Alan Cabrera","member":true,"projects":...}

The quick and dirty Perl script I used for this is included below. It
would obviously be cleaner if done as a part of the p.a.o generation
instead of as a web scraper.

BR,

Jukka Zitting

----
#!/usr/bin/perl

use strict;
use warnings;

use JSON;

my @committers = ();
my $html = join '', <>;
while ($html =~
      m{<tr>
        <td bgcolor=".*?">(<b class="member">)?<a
id='.+?'></a>(.+?)(<a id='.'></a>)?(</b>)?</td>
        <td bgcolor=".*?">(<b class="member">)?(<a
href="(.*?)">)?(.+?)(</a>)?(</b>)?</td>
        <td bgcolor=".*?">(.*?)</td>
        </tr>}sg) {
  my $member = $1 ? JSON::true : JSON::false;
  my $username = $2;
  my $url = $7 || "";
  my $name = $8;
  my $projects = $11;
  my @projects = ();
  my @pmcs = ();
  while ($projects =~ m{<a
href='committers-by-project\.html#.*?'>(.*?)(-pmc)?</a>}g) {
    if ($2) {
      push @pmcs, $1;
    } elsif ($1 ne "member"
         and $1 ne "apsite"
         and $1 ne "pmc-chairs"
         and $1 ne "infrastructure") {
      push @projects, $1;
    }
  }
  open FILE, ">committers/$username.json";
  print FILE to_json({
    username => $username,
    fullName => $name,
    projects => \@projects,
    pmcs     => \@pmcs,
    member   => $member
  });
  close FILE;
  push @committers, $username;
}

open FILE, ">committers.json";
print FILE to_json({ committers => \@committers });
close FILE;

Mime
View raw message