accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Growing project involvement
Date Tue, 13 Jan 2015 23:06:56 GMT
Oh. My. Thank you *so* much for writing this all up. It's extremely 
helpful. Some comments inline.

Joe Stein wrote:
> I have had a lot of feedback in the market place on Accumulo. This feedback
> was 100% from folks that didn't have Accumulo as a requirement to run and
> feel that it is very relevant to broader adoption. All of the below
> comments are a combination of my own opinions and what I have heard from
> others in the market in discussion about Accumulo.
> 1) Iterators are awesome from a software architecture perspective. From a
> development perspective if you have worked with them you have an experience
> or two to share on how to improve them. Anything that can be done to
> improve this experience for developers will be welcomed for new and
> existing users.

This comes up a lot. I know I always struggle with actually describing 
the *why* to someone. Maybe more concrete examples are the best route -- 
e.g. expand our existing examples in the codebase or create some 
PMC-managed repos with examples?

> 2) Lots of little cosmetic surface things in lots of places and attentions
> to details. e.g. the branch is not the
> latest and even the latest branch (master?) README isn't really welcoming
> or appealing from a "my first time visiting the project" perspective. For
> new users you only get 1 impression for a first impression, this is
> important under the "technical marketing umbrella".  Some Vagrant and/or
> Docker will make getting up and running quickly fantastic for folks that
> have to (or want to) interact with Accumulo.

I will file an INFRA issue tonight to switch this to `master` (most 
recent/unstable). The tags should be self-explanatory in users 
finding/building stable releases.

> 3) The project should/could have more out of the box integrations and
> support from the core project release cycles. e.g. Accumulo Framework for
> Apache Mesos. I don't think the drive for this (Mesos support) is lacking
> but having spoken to other Accumulo users there is no clear path how folks
> can help to make this happen. The eco system just isn't big enough for
> these type of projects to exist successfully outside the core project on
> some github url.

I know we have some hooks into YARN integration with Apache Slider, but 
I haven't really looked into Mesos integration (nor am I familiar with 
what/how to go about this). I'm sure we could reach out to someone like 
Paco Nathan and get some direction if no one else has a good feeling.

> 4) Some eco system page or place where "all things accumulo" can be sought
> after... planet accumulo, something like that (no reason to reinvent this
> wheel).  This is probably a combined issue of lack of aggregatable things
> (which we should try to improve) and the ability to have them seen in one
> place.  One of the coolest things I have seen Accumulo release since
> following the project has been
> but haven't seen anything else since this posting. Is it that the
> information isn't bubbling up or that people aren't posting more about cool
> things in place? Are people even using it?

I think this is one clear direction we can made easy progress in. I know 
there are lots of neat things happening, but in production and 
development. I'm not sure how much we lack in outward posting due to 
"developers not liking to write" and how much is just "other reasons".

> 5) Not; just; Java; please; =>  how about more Scala (maybe Iterator
> examples) and/or Go with some ProtoBuf interface? from an implementation
> perspective Java; just; kills; things; in; their; tracks; ! and Thrift has
> a way to-do that too...

:) -- I really think Protobufs with Accumulo Combiners (formerly 
Aggregators) are pretty darn slick to use (and used it to build the 
multi-DC replication). That's an obvious win in the form of 

I know others have experience with Scala. Any good examples that can be 
shared for how it works well with Accumulo? Go, as well?

> 6) Operations is almost an opaque box. Getting something up and running for
> development is important but so is pushing it into production and
> sustaining it at scale. The more information about how this is done and
> where things work and do not work will be a  *HUGE* driver for the
> community (IMHO). Again, maybe all this stuff is out there and #4 is really
> how to solve this for folks to not spend their nights and weekends googling.

Indeed. This is a very hard problem in general (and I think the market 
very obviously confirms it). Overall, I do want to say that I think we 
do a good job in helping people who come to us and ask questions (go 
us!). The hard part is making it self-service: a solution for a problem 
can range from DNS all the way up to an Iterator implementation.

How do other projects deal with this? Is it primarily good answers that 
eventually get indexed by Google and people can find them? How can we be 
more aggressive in this regard?

> 7) Apache Spark support. While arguably this goes under #3 I think it has
> to be called out as another (better?) option for MapReduce. It is really
> easy to get Spark to use AccumuloInputFormat which is wonderful and a
> fantastic opportunity for making Accumulo shine with Spark. A few samples
> people can run with Spark and Accumulo together that do something more than
> word count will go a long way to attracting an audience too.

I lack experience here as well but again know that others have 
experience here. Spark users -- give us some more direction :)

> 8) More ways to highlight the work loads that Accumulo was built for and
> what it does now and how it is not about website or social or ads is
> important to organizations in verticals that care differently about their
> data.

That's a good point. I know that many of our people have put a lot of 
thought into these sorts of verticals in the past, but they haven't made 
it into "official" write-ups. This would be a good area we can improve 
through our own "marketing".

> 9) Better call out features and highlight them with more examples
> explicitly. I might be repeating myself at this point but wanted to bring
> up "Tracing" as another good example of a REALLY cool feature that folks
> when they see it don't entirely understand what/how todo with it. Google
> for "accumulo trace" or even going through the documentation it is
> impossible to figure out how to use it and make it work without late nights
> and tender loving care.

Good point. Examples + documentation + blog posts would help here. 
Perhaps focused-usages of the novel features are a better way to go 
about this? A concrete implementation is a better read than an abstract 
concept and lends itself well to avoid "so what?" questions.

> None of these things are easy and are very demanding for open source
> projects and communities. I think this is a great discussion and hope to
> continue to contribute moving forward.

Thanks so much, again, for taking the time to write this down!

> /*******************************************
>   Joe Stein
>   Founder, Principal Consultant
>   Big Data Open Source Security LLC
>   Twitter: @allthingshadoop<>
> ********************************************/
> On Tue, Jan 13, 2015 at 4:37 PM, Keith Turner<>  wrote:
>> I think a minimal getting started guide is needed on the web site.
>> Something that will take a user from download to running on a cluster in as
>> few steps as possible.  This info is buried in the README, but there is too
>> much other stuff in the readme.
>> On Tue, Jan 13, 2015 at 4:09 PM, Josh Elser<>  wrote:
>>> I meant to send this out closer to the new year (to ride on the new year
>>> resolution stereotype), but I slacked. Forgive me.
>>> As should be aware by those paying attention, we have had very little
>>> growth within the project over the past 6-9 months. We've had our normal
>>> spattering of contributions, a few from some repeat people, but I don't
>>> think we've grown as much as we could.
>>> I wanted to see if anyone has any suggestions on what we could try to do
>>> better in the coming year to help more people get involved with the
>>> project. I don't want this to turn into a "we do X wrong" discussion, so
>>> please try to stay positive and include suggestion(s) for every problem
>>> presented when possible.
>>> Also, everyone should feel welcome to participate in the discussion here.
>>> If you fall into the "bucket" described, I'd love to hear from you. If
>>> anyone doesn't want to publicly respond, please feel free to email me
>>> privately and I'll anonymously post to the list on your behalf.
>>> Some ideas to start off discussion:
>>> * Help reduce barrier to entry for new developers
>>>    - Ensure imple/easy-to-process instructions for getting and building
>>> code in common environments
>>>    - Instructions on running tests and reporting issues
>>> * More high-level examples
>>>    - Maybe we start too deep in distributed-systems land and we scare away
>>> devs who think they "don't know enough to help"
>>>    - Recording "newbie" tickets and providing adequate information for
>>> anyone to come along and try to take it on
>>>    - Encourage/help/promote "concrete" ideas/code in the project.
>> Something
>>> that is more tangible for devs to wrap their head around (also can help
>>> with adoption from new users)
>>> * Better documentation and "marketing"
>>>    - We do "ok" with the occasional blog post, and the user manual is
>>> usually thorough, but we can obviously do better.
>>>    - Can we create more "literature" to encourage more users and devs to
>>> get involved, trying to lower the barrier to entry?
>>> Thanks all.

View raw message