accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie Rinaldi <>
Subject Re: Accumulo design questions
Date Tue, 06 Nov 2012 20:30:25 GMT
On Tue, Nov 6, 2012 at 11:01 AM, Sukant Hajra <>wrote:

> I've been trying to understand Accumulo more deeply as we use it more.  To
> supplement the on-line documentation and source, I've been referencing some
> blog articles on HBase (Lars George has some ones), HBase docs, and the
> BigTable paper.
> But I'm curious about some of the deviations of Accumulo from BigTable and
> HBase.
> The questions I have right now are:
>     1. Is the format of an RFile close to HFile version 1, HFile version
> 2, or
>     at this point is the format really it's own thing?  I found good
>     documentation on the HFile, but I haven't yet found similar
> documentation
>     on RFiles.  There's the source code, but I haven't dug into that yet.

I think there is a different HFile for each column family, isn't there?  An
RFile stores all columns, all locality groups in a single file, which is
another reason you don't get the same performance penalty for having lots
of column families in Accumulo.

>     2. I understand that HBase doesn't do well with too many column
> families.
>     However, creating too many column families in HBase isn't likely anyway
>     because you can't (I believe) create them dynamically.  Accumulo
> allows you
>     to create column families dynamically.  But I wonder if this can come
> at a
>     cost.  Is there a benefit to using column families less frequently if
>     possible in Accumulo?  Or is the cost of using column families more or
> less
>     the same as using column qualifiers.
>     3. I guess one way families might be different from qualifiers relates
> to
>     HBase's recommendation to keep column family names short to avoid
> needless
>     storage waste.  That should apply to Accumulo as well, right?
>     4. In supporting dynamic column families, was there a design trade-off
> with
>     respect to the original BigTable or current HBase design?  What might
> be a
>     benefit of doing it the other way?

The main thing Accumulo had to do differently from BigTable to allow
dynamic creation of column families was to create a default locality
group.  That's the locality group that stores column families that aren't
specified for any other locality group.  I recall Keith saying it was kind
of a pain to implement, but I don't see any obvious negative tradeoffs of
the design.


> Thanks,
> Sukant

View raw message