hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kireet <kir...@feedly.com>
Subject Re: HBase load problems
Date Thu, 17 Oct 2013 18:21:06 GMT
Perhaps this is the problem, our salt is 4 bytes, which completely 
randomizes things. We create about 35m rows/day. What we are thinking is 
that if older data isn't changed, then if we split recent data into it's 
own table then compactions would be much cheaper because the older data 
wouldn't need to be rewritten.

On 10/17/13 1:46 PM, Vladimir Rodionov wrote:
> afaik,  your keys are already:
> salted_prefix + timestamp
> If cardinality of salted_prefix is not than big and it should not be (we use - 256 in
our system) you will have already your rows grouped by
> time in HFile blocks, I think you should have pretty pretty good locality of data in
a block cache.
> What is the cardinality of salted_prefix? And how many rows to  you ingest per day?
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
> ________________________________________
> From: Kireet [kireet@feedly.com]
> Sent: Thursday, October 17, 2013 5:56 AM
> To: user@hbase.apache.org
> Subject: Re: HBase load problems
> Sorry for the confusion, I was trying to get at if it would be a large
> operation or there would be a large disruption right after increasing
> the max region size. It sounds like it wouldn't.
> Getting back to the original question, if we do 90%+ of our reads/writes
> to very recent data, would it make sense to keep that in a separate
> table? It seems like that may keep things more optimized, the block
> cache would be more efficient, compactions would run quicker with much
> less data on the 'recent data' table and perhaps less often on the
> 'older data' table(s), etc.
> On 10/17/13 12:52 AM, Stack wrote:
>> On Wed, Oct 16, 2013 at 6:26 PM, Kireet <kireet@feedly.com> wrote:
>>> is there a downside to going to larger regions?
>> Generally we see pluses (See Bigger Regions in
>> http://hbase.apache.org/book/important_configurations.html for the latest
>> scripture on the topic).  Downsides would be something like compactions run
>> longer (coarsely, same overall work, just takes longer to complete a region)
>>> It looks like merge is a larger operation, so changing the config setting
>>> would simply cause existing regions to grow to the larger value right?
>>> Would the way to confirm this be to check compaction rates and also maybe
>>> the size of store files in hdfs?
>> I'm not sure I understand the question above Kireet.  You doing merges?
>>    Yes, you can just change the configs on the cluster and regions will just
>> grow (will need to rolling restart the cluster to pick up the config.)
>> Please ask again what you would confirm.
>> Thanks,
>> St.Ack
>> P.S. Feedly-fan here.
> Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.

View raw message