accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: EXTERNAL: Re: Table design
Date Wed, 21 Mar 2012 18:47:41 GMT
Yes, that is exactly what I'm trying to say.

On Wed, Mar 21, 2012 at 2:35 PM, Cardon, Tejay E <tejay.e.cardon@lmco.com>wrote:

>  Thanks Eric.  Just to make sure I understood correctly:****
>
> If I have many (say 5+) locality groups, that would be bad for
> performance, but if I have 2 locality groups with 10+ column families each,
> that would not be a major issue?****
>
> ** **
>
> Thanks,****
>
> Tejay****
>
> ** **
>
> *From:* Eric Newton [mailto:eric.newton@gmail.com]
> *Sent:* Wednesday, March 21, 2012 12:16 PM
> *To:* accumulo-user@incubator.apache.org
> *Subject:* EXTERNAL: Re: Table design****
>
> ** **
>
> In accumulo, there are no limits on the number/size of column families.
>  However, if you do want to group them into separate locality groups, you
> need to list the column families for the group.  This has to be storable in
> zookeeper, so groups should be limited to "dozens" of column families.
>  Reading from different groups at the same time will use more resources,
> so, like HBase, you should limit the number of groups you have.****
>
> ** **
>
> The RFile format takes advantage of the similarity of data between keys,
> and does not repeat elements of the key that are identical from key to key.
>  If everything has the same visibility, it will only be listed once.****
>
> ** **
>
> And, when I say there is "no limit"... there is no predefined limit, but
> rows, cf, cq, visibilities and values all need to comfortably fit in the
> physical RAM available, perhaps multiple times, as they are serialized and
> deserialized in the various services.  ****
>
> ** **
>
> As for table design... it depends a great deal on what you want to do.****
>
> ** **
>
> Here is a short description of a complex indexing scheme that makes it
> efficient to do distributed conjunctive queries on documents: ****
>
> ** **
>
> http://incubator.apache.org/accumulo/example/wikisearch.html****
>
> ** **
>
> It makes it possible to do fast searches for queries like `TITLE matches
> "f.*bar" and contains the words "catch" and "22" '.****
>
> ** **
>
> -Eric****
>
> ** **
>
> ** **
>
> On Wed, Mar 21, 2012 at 12:43 PM, Cardon, Tejay E <tejay.e.cardon@lmco.com>
> wrote:****
>
> Thank you ahead of time for the input.****
>
>  ****
>
> When designing tables in HBase, one is encouraged to use single letter
> names for column families, and to only have 2 or 3 families.  The
> documentation states that this has to do with the underlying way that the
> data is stored on disk.  I’m curious if similar considerations need to be
> made with Accumulo. ****
>
>  ****
>
> Furthermore, and more specific to Accumulo, what considerations should be
> made for visibility labels?  If the visibility string for each cell is
> stored on disk along with the data in the cell, I could see where both long
> roll names and large combinations of rolls could have a major impact on
> disk utilization.****
>
>  ****
>
> Finally, can anyone recommend a good resource for Accumulo table design
> (or for key/value store design in general)?****
>
>  ****
>
> Thanks****
>
> Tejay Cardon****
>
>  ****
>
> [image: cid:image001.jpg@01CC3D77.64A0E3D0]****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>

Mime
View raw message