accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: Table design
Date Wed, 21 Mar 2012 18:16:09 GMT
In accumulo, there are no limits on the number/size of column families.
 However, if you do want to group them into separate locality groups, you
need to list the column families for the group.  This has to be storable in
zookeeper, so groups should be limited to "dozens" of column families.
 Reading from different groups at the same time will use more resources,
so, like HBase, you should limit the number of groups you have.

The RFile format takes advantage of the similarity of data between keys,
and does not repeat elements of the key that are identical from key to key.
 If everything has the same visibility, it will only be listed once.

And, when I say there is "no limit"... there is no predefined limit, but
rows, cf, cq, visibilities and values all need to comfortably fit in the
physical RAM available, perhaps multiple times, as they are serialized and
deserialized in the various services.

As for table design... it depends a great deal on what you want to do.

Here is a short description of a complex indexing scheme that makes it
efficient to do distributed conjunctive queries on documents:

http://incubator.apache.org/accumulo/example/wikisearch.html

It makes it possible to do fast searches for queries like `TITLE matches
"f.*bar" and contains the words "catch" and "22" '.

-Eric


On Wed, Mar 21, 2012 at 12:43 PM, Cardon, Tejay E
<tejay.e.cardon@lmco.com>wrote:

>  Thank you ahead of time for the input.****
>
> ** **
>
> When designing tables in HBase, one is encouraged to use single letter
> names for column families, and to only have 2 or 3 families.  The
> documentation states that this has to do with the underlying way that the
> data is stored on disk.  I’m curious if similar considerations need to be
> made with Accumulo. ****
>
> ** **
>
> Furthermore, and more specific to Accumulo, what considerations should be
> made for visibility labels?  If the visibility string for each cell is
> stored on disk along with the data in the cell, I could see where both long
> roll names and large combinations of rolls could have a major impact on
> disk utilization.****
>
> ** **
>
> Finally, can anyone recommend a good resource for Accumulo table design
> (or for key/value store design in general)?****
>
> ** **
>
> Thanks****
>
> Tejay Cardon****
>
> ** **
>
> [image: cid:image001.jpg@01CC3D77.64A0E3D0]****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>

Mime
View raw message