accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sukant Hajra" <>
Subject Accumulo design questions
Date Tue, 06 Nov 2012 19:01:57 GMT
I've been trying to understand Accumulo more deeply as we use it more.  To
supplement the on-line documentation and source, I've been referencing some
blog articles on HBase (Lars George has some ones), HBase docs, and the
BigTable paper.

But I'm curious about some of the deviations of Accumulo from BigTable and

The questions I have right now are:

    1. Is the format of an RFile close to HFile version 1, HFile version 2, or
    at this point is the format really it's own thing?  I found good
    documentation on the HFile, but I haven't yet found similar documentation
    on RFiles.  There's the source code, but I haven't dug into that yet.

    2. I understand that HBase doesn't do well with too many column families.
    However, creating too many column families in HBase isn't likely anyway
    because you can't (I believe) create them dynamically.  Accumulo allows you
    to create column families dynamically.  But I wonder if this can come at a
    cost.  Is there a benefit to using column families less frequently if
    possible in Accumulo?  Or is the cost of using column families more or less
    the same as using column qualifiers.

    3. I guess one way families might be different from qualifiers relates to
    HBase's recommendation to keep column family names short to avoid needless
    storage waste.  That should apply to Accumulo as well, right?

    4. In supporting dynamic column families, was there a design trade-off with
    respect to the original BigTable or current HBase design?  What might be a
    benefit of doing it the other way?


View raw message