hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: Pig Contributor meeting notes
Date Thu, 26 Aug 2010 07:55:24 GMT
Wonderful, Dmitriy, It's pity for me missing the contributor meeting.
And any ppt shared ?

On Wed, Aug 25, 2010 at 8:32 PM, Dmitriy Ryaboy <dvryaboy@gmail.com> wrote:
> Twitter hosted this month's Pig contributor meeting.
> Developers from Yahoo, Twitter, LinkedIn, RichRelevance, and Cloudera were
> present.
> 1. Howl
> First, Alan Gates demoed Howl, a project whose goal is to provide table
> management service for all of hadoop. The vision is that ultimately you will
> be able to read/write data using regular MR, or Pig, or Hive, and read it
> using any of those three, with full support of a partition-aware metadata
> store that will tell you what data is available, what its schema is, etc,
> reusing a single table abstraction.
> Currently, tables are created using (a restricted subset of) Hive ddl
> statements; a howl cli for this will be created, which will enforce the
> restricted subset.
> Writing to the table using Pig or MapReduce is supported. Reading can
> already be done using all three.
> At the moment, a single Pig store statement can only store into a single
> partition; adding ability to "spray" across partitions is on the roadmap.
> This, and a good api for interacting with the metastore, are the two areas
> that were identified as good opportunities for the wider developer community
> to get involved with the project. The source code is on GitHub, and is at
> the moment synchronized with the development trunk manually; Yahoo folks
> will look into changing this.
> Security is a concern, and Yahoo will be working on it. Making it possible
> for Hive to write to the tables is at the moment not as high a priority as
> the others listed, it would basically involve just writing a Hive SerDe (an
> equivalent of Pig's StoreFunc).
> 2. Azkaban presentation
> Russel Jurney and Richard Park from LinkedIn presented the workflow
> management tool open-sourced by LinkedIn, called Azkaban. It allows you to
> declare job dependencies, has a web interface for launching and monitoring
> jobs, etc. It has a special exec mode for Pig that lets you set some
> Pig-specific options on a per-job basis. It does not currently have
> triggering or job-instance parameter substitution (it does have job-level
> parameter substitution).  When asked what would Pig could do to make life
> easier for Azkaban, the two things Richard identified were registering jars
> through the grunt command line and a way to monitor the running job -- both
> of these are already in trunk, so we're in pretty good shaped for 0.8
> 3. Piggybank discussion
> Kevin Weil led a discussion of the piggybank. There are a few problems with
> it -- it's released on the Pig schedule, and has quite a few barriers to
> submission that are, anecdotally at least, preventing people from
> contributing. Several options were discussed, with the group finally
> settling on starting a community-curated GitHub project for piggybank. It
> will have a number of committers from different companies, and will aim to
> make it easy for folks to contribute (all contribs will still have to have
> tests, and be Apache 2.0-licensed). More details will be forthcoming as we
> figure them out. Initially this project will be seeded with the current
> Piggybank functions some time after 0.8 is branched. The initial list of
> committers Kevin Weil (Twitter), Dmitriy Ryaboy (Twitter), Carl Steinbach
> (Cloudera), and Russel Jurney (LinkedIn). Yahoo will also nominate someone.
> Please send us any thoughts you might have on this subject. It was suggested
> that a lot of common code might be shared with Hive UDFs, which have the
> same problems as Piggybank does, and that perhaps the project can be another
> collaboration point between the projects. Not clear how that would work,
> Carl will talk to other Hive people.
> Pig 0.9
> So far the items on the list for 0.9 are: better type propagation /
> resolution story and documentation,  perhaps different parser (ANTLR?), some
> performance tweaks, and map types with fixed-type values. Much still to be
> decided.
> The next contributor meeting will be hosted by LinkedIn in October.
> -Dmitriy

Best Regards

Jeff Zhang

View raw message