accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Whyne <>
Subject Re: accumulo pull request: raccumulo Packaged: 2013-05-09 22:18:20 UTC; pgrim
Date Mon, 28 Oct 2013 16:33:56 GMT

That makes sense.

Thanks for the help, let me know when you hear back from
general@incubator and how we can best proceed. I'll keep working the
things Chris mentioned on my end.



>When we say 'contrib', we don't actually mean the 'contrib' directory in
>the core of Accumulo. Look at the first four repos on [1] (bsp, pig,
>instamo-archetype and the wikisearch). When we say "contrib repository",
>a brand new repository here is where we would see raccumulo fitting in.
>Despite this code being strongly tied to Accumulo, Accumulo is not
>strongly tied to it. That said, I'm still open to help you get raccumulo
>included underneath the Accumulo "umbrella" which will give anyone
>providing back a way to build up merit and recognition among the
>community (so we can get to a point where all interested parties can
>manage it themselves).
>I'll go ahead and contact general@incubator to see about any potential
>licensing issues due to R's GPL-ness and see what other guidance they
>might have about importing the code.
>- Josh

On Sun, Oct 27, 2013 at 6:14 PM, Eric Whyne <> wrote:

> 1. I will take care fo the ASF ICLAs from contributors this week. I don't
> think that's a problem based on conversations I've had.
> 2. CCLA I don't see any major hurdles to doing this, I'll set up some
> discussions with other company leadership to move this forward. I'll be
> advocating for this because I think it's a great idea.
> 3a. I don't see the contributed code being able to sustain a new
> community.  Although there is a demand for the R interface this creates,
> there's not many other directions or advancements that could be made that
> would be useful to undertake independent of the core accumulo roadmap.
> 3b. After some discussion on our end, it's inclusion in the contrib where
> the pull request places it is optimal for it's current state. It will make
> an R interface to accumulo part of the core distribution. This could
> potentially increase adoption of accumulo across more of the data science
> community, I know it would intrigue us from a corporate standpoint...
> that's why it was written.
> 3c. As a future roadmap for the capability, I suspect that there's a
> better way to do what this code accomplishes (sans proxy?) and that the
> functionality will get rolled somewhere into the rest of the project in a
> better way. That effort would probably A: use large chunks of this code (so
> you'd want to include it to ensure licensing and avoid potential conflicts)
> and B: make any contrib project obsolete, leading to it's demise. If it's
> an independent contrib project or incubating as an apache project a lot of
> time would have been wasted. If it's part of the core distribution it
> equates to merely moving a directory from the contrib section of the code
> base PMC can feel free to remove the parts that are obsolete and much
> confusion would be avoided.
> 4. We're willing to move forward with it in whatever way that makes sense
> to all involved. The core developer (Phil Grim) has an interest in
> maintaining the capability since we have customers depending on it. I think
> the best way would be to keep our github fork updated and then just sending
> maintenance pull requests as modifications are made on our end. Of course,
> we'd be looking for guidance on how to best license it and
> establishing/signing whatever we have to to ensure compliance with ASF
> requirements.
> R/
> Eric
> On Sun, Oct 27, 2013 at 2:37 PM, Chris Mattmann <>wrote:
>> Hi Josh, and Eric,
>> Thanks. As for the IP clearances and stuff, yes, makes total sense. Eric
>> and
>> anyone that contributed to this patch need and should have ASF ICLAs on
>> file.
>> It's a simple process, download the ICLA here:
>> And submit it for each individual contributor to
>> Beyond that it may be nice in this case since it's got corporate stuff
>> attached to it to see if Data Tactics is willing to sign an Apache
>> Corporate
>> Contributor License Agreement (CCLA):
>> Not a requirement by any means, but a nice thing. So Eric and others at
>> Data Tactics, that's something to consider.
>> Once that's taken care, the Accumulo PMC can take on the stewardship of
>> the code if someone on that PMC like Josh is willing to work through any
>> issues
>> that that person (or others on the PMC) have with bringing it on board.
>> Also since Apache is a meritocracy, Eric and others, you will be credited
>> for the work that you contribute and if you keep doing so over time, and
>> creating work for the Accumulo PMC members, your work will likely be
>> recognized.
>> The Incubator PMC is there as a clearinghouse for new projects and new
>> communities to begin their journey towards Apache individual project
>> status (TLP) and the Apache way. Eric: do you see this project as an
>> entirely
>> new community that is complementary to Accumulo? If so, the Incubator
>> would
>> be a good route to go. If you see this as part of core Accumulo (or even
>> contrib)
>> and you can convince someone on the PMC like Josh to help shepherd this
>> code
>> into their PMC, that's also another route.
>> Cheers,
>> Chris
>> -----Original Message-----
>> From: Josh Elser <>
>> Reply-To: "" <>
>> Date: Sunday, October 27, 2013 11:22 AM
>> To: "" <>
>> Subject: Re: accumulo pull request: raccumulo Packaged: 2013-05-09
>> 22:18:20 UTC; pgrim
>> >Thanks again, Eric (and Phil).
>> >
>> >It's awesome to see this amount of work put in to integrate with R. But,
>> >personally, I don't think direct inclusion in Accumulo is the proper
>> >place for it.
>> >
>> >It definitely cannot be directly merged as such: we would need to make
>> >sure we have ICLAs from all individuals and a CCLA from Data-Tactics (if
>> >memory serves). Essentially, we need to make sure the proper paperwork
>> >exists that the ownership is assigned to the ASF (instead of individuals
>> >or Data-Tactics as the notices alternate between currently). Also, the
>> >ASF has a general process for handling imports of code. [1]
>> >
>> >It looks like it's missing any documentation on how to use it too, e.g.
>> >the user needs to start an instance of the thrift proxy themselves, but
>> >that's a little nit-picky on my end :)
>> >
>> >Given the chatter on ACCUMULO-1804, it seems like it's desired for this
>> >to be its own contrib repo as a part of the ASF. The next step here
>> >would be for us to contact the ASF incubator to figure out the IP rules
>> >and shake out any licensing concerns.
>> >
>> >Let me know for sure and I can kick off a message to the incubator if
>> >this is how you (and Data-Tactics) want to proceed. [2]
>> >
>> >- Josh
>> >
>> >[1]
>> >[2]
>> >
>> >
>> >On 10/25/13, 12:13 PM, ericwhyne wrote:
>> >> GitHub user ericwhyne opened a pull request:
>> >>
>> >>
>> >>
>> >>      raccumulo Packaged: 2013-05-09 22:18:20 UTC; pgrim
>> >>
>> >>      This pull request is in response to this issue:
>> >>
>> >>
>> >>      What this code is:
>> >>      Need to be able to support users who utilize RStudio to conduct
>> >>analysis of data residing in the Accumulo data space instead of moving
>> >>data from one repository to a stand alone system to have the analytic
>> >>run in memory. RStudio should be able to make calls directly to the data
>> >>space and provide the output within the RStudio interface.
>> >>
>> >>
>> >> You can merge this pull request into a Git repository by running:
>> >>
>> >>      $ git pull master
>> >>
>> >> Alternatively you can review and apply these changes as the patch at:
>> >>
>> >>
>> >>
>> >> ----
>> >> commit 116c045d05074b0e0ccf907e42235f94aa7c1703
>> >> Author: Eric Whyne <>
>> >> Date:   2013-10-25T16:08:38Z
>> >>
>> >>      raccumulo Packaged: 2013-05-09 22:18:20 UTC; pgrim
>> >>
>> >> ----
>> >>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message