accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <>
Subject Re: better presplitting
Date Sat, 21 Jun 2014 18:38:26 GMT
On Sat, Jun 21, 2014 at 10:59 AM, Jeremy Kepner <> wrote:

> I would encourage the community to figure this our for the following
> reason.
> As other databases adopt Accumulo's security features, Accumulo's
> primary feature is performance.
> Other NoSQL databases have let performance slide in favor of adding more
> features.
> The gap between Accumulo performance and other NoSQL databases is growing.
> There are many applications where Accumulo can do on one node what it would
> take 20 or more nodes to do using another technology.
> That said, the SQL and NewSQL communities have not been idle and
> their are some fairly high performance competitors out there.
> In the future, I believe Accumulo's primary performance competition
> will come from the SQL and NewSQL communities.

At the risk of derailing my own thread, while I agree that we have a
community problem with our story of "why Accumulo," I don't agree that we
should necessarily chase after performance as that story.

First, there's still a dearth of comparable-across-system published
benchmarks that we could use to support the notion that we're more
performant currently.

Second, while I agree that our cell visibility tagging is a poor
differentiator (regardless of things like the addition of tags in HBase
proper in 0.98), our implementation is substantially more mature than other
options for now. We could instead leverage that maturity and add in other
data governance enablers, e.g. making it easier for applications to do data
provenance tracking.

Finally, if we do push for performance we'll have to get better at
quantifying it. That means getting benchmarks we can compare on other
systems that others will agree are legitimate (likely YCSB or its ilk).
More importantly, it means defining the boundary of performance we care
about. There's a big difference between trying to be the most performant
single-node key/value store and trying to be the one that can dominate at
the 5k or 10k+ node level.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message