accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <>
Subject Re: Table design
Date Thu, 07 Jun 2012 13:05:01 GMT
Some thoughts:

Accumulo will accomodate keys that are very large (like 100K) but I don't
recommend it. It makes indexes big and slows down just about every
operation.  A row-id or column qualifier that is 200 bytes long is not
extreme.  Remember that compression will decrease the storage requirements,
especially since the sort creates natural redundancy in the row id.

Is it important to find "Three men and a baby" just after "Three little
pigs"?  If not, hash the title and look up the hash.  That will give you a
nice small key.  This also avoids hot-spots, like all the titles that start
with "The" or a common letter, like "S". But you may need to deal with hash

Counters can give you "append" hot-spots.  As you ingest, the most active
tablet will always be the newest one.

A random UUID is useful, but large, if you just want a unique identifier
associated with a title.

Accumulo performance should not change if you have 1 table or 100.  But
tables are a convenient unit for management.  You can offline, compact and
delete a table.  You can configure many table-specific properties which can
give you performance benefits.


On Wed, Jun 6, 2012 at 4:46 PM, Perko, Ralph J <> wrote:

> Hi,  I am in the process of designing some Accumulo tables for an app and
> have some questions:
> Lookup Table:
> The data's natural qualifier is a title.  This title can be any length.
>  Some are as long as 200 characters.
> I am using this title as a row id and also as a column qualifier in other
> places.
> Is it considered good practice to have a lookup table for these titles
> (like RDBMS), replacing them with some incremented integer value, or should
> I just continue to use these long titles as row ids?
> Multiple Tables:
> What are the best practices around when to create a new table?  I have
> been breaking up my tables based on row id semantics.  For example, title
> row ids are in a different table than row ids based on some analysis count.
> Does breaking up data into multiple tables, help/hurt/ or do nothing for
> accumulo performance?
> Thanks,
> Ralph
> __________________________________________________
> Ralph Perko
> Pacific Northwest National Laboratory

View raw message