accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Kargar <mkar...@phemi.com>
Subject Re: Accumulo as a Column Storage
Date Thu, 19 Oct 2017 22:47:47 GMT
That makes sense. So this means that there's no limit or concerns on
having, potentially,  large number of column families (holing only one
column qualifier), right?

On Thu, Oct 19, 2017 at 3:06 PM, Josh Elser <elserj@apache.org> wrote:

> Yup, that's the intended use case. You have the flexibility to determine
> what column families make sense to group together. Your only "cost" in
> changing your mind is the speed at which you can re-compact your data.
>
> There is one concern which comes to mind. Though making many locality
> groups does increase the speed at which you can read from specific columns,
> it decreases the speed at which you can read from _all_ columns. So, you
> can do this trick to make Accumulo act more like a columnar database, but
> beware that you're going to have an impact if you still have a use-case
> where you read more than just one or two columns at a time.
>
> Does that make sense?
>
>
> On 10/19/17 5:50 PM, Mohammad Kargar wrote:
>
>> AFAIK in Accumulo we can use "locality groups" to group sets of columns
>> together on disk which would make it more like  a column-oriented database.
>> Considering that "locality groups" are per column family, I was wondering
>> what if we treat column families like column qualifiers (creating one
>> column family per each qualifier) and assigning each to a different
>> locality group. This way all the data in a given column will be next to
>> each other on disk which makes it easier for analytical applications to
>> query the data.
>>
>> Any thoughts?
>>
>> Thanks,
>> Mohammad
>>
>>

Mime
View raw message