hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Common advices for hosting a huge table
Date Tue, 15 Dec 2015 21:12:08 GMT
w.r.t. option #1, also consider
http://hbase.apache.org/book.html#arch.bulk.load

FYI

On Tue, Dec 15, 2015 at 12:17 PM, Frank Luo <jluo@merkleinc.com> wrote:

> I am in a very similar situation.
>
> I guess you can try one of the options.
>
> Option one: avoid online insert by preparing data off-line. Do something
> like http://hbase.apache.org/0.94/book/ops_mgt.html#importtsv
>
> Option two: If the first option doesn’t work for you. It will be better to
> reduce your region size and increase read/write timeout. So that you allow
> compact to happen while you insert data, but since the size is smaller, it
> takes less time to compact/split. With this option, you can have a table
> available 24/7 but the overall performance tends to go down dramatically
> once some regions starts compacting.
>
> Option three: If you can afford some down time, ie, two hours every day.
> You can manage compact/split during that time. What I usually do is to run
> major-compact against all tables, then split ones that is large so that it
> has enough room to grow for the next day’s insert.
>
> I hope it helps.
>
> From: 林豪 [mailto:linhao@qiyi.com]
> Sent: Monday, December 14, 2015 11:51 PM
> To: user@hbase.apache.org
> Subject: Common advices for hosting a huge table
>
> Hi, all:
>
> We have a HBase Cluster which has several hundreds of region servers and
> each RS hosts nearly 300 regions. Currently one of our tables has increased
> to 16 TB and some region exceeds 10 GB. Major compaction on these regions
> is painful as it produces a lot of disk I/O and will affect the performance
> of RS. The auto splitting size of IncreasingToUpperBoundRegionSplitPolicy
> increased to 16 GB or more for this huge table. My solution is set
> attribute MAX_FILESIZE on this table so ConstantSizeRegionSplitPolicy auto
> splitting will work again.
>
> My question is: What are the common advices or configuration options to
> host such a huge table. If we decide to limit the region size, how can we
> decide the optimised region size? If region size is too large, major
> compaction is painful; but if region size is too small, then we have a lot
> of small region which will overwhelm the RS.
>
> 林豪
> 云平台  研发工程师
>
> 爱奇艺公司
> QIYI.com, Inc.
> 地址:上海市徐汇区宜山路1388号民润大厦6层
> 邮编:201103
> 手机:+86 136 1180 1618
> 电话:+86 21 5451 9520 8393
> 传真:+86 21 5451 9529
> 邮箱:linhao@qiyi.com<mailto:zhouxiqiao@qiyi.com>
> 网址:www.iQIYI.com<http://www.iqiyi.com/>
> [cid:B21E048D-B27D-4528-92D0-36BAE7117128]<http://www.iqiyi.com/>
>
> This email and any attachments transmitted with it are intended for use by
> the intended recipient(s) only. If you have received this email in error,
> please notify the sender immediately and then delete it. If you are not the
> intended recipient, you must not keep, use, disclose, copy or distribute
> this email without the author’s prior permission. We take precautions to
> minimize the risk of transmitting software viruses, but we advise you to
> perform your own virus checks on any attachment to this message. We cannot
> accept liability for any loss or damage caused by software viruses. The
> information contained in this communication may be confidential and may be
> subject to the attorney-client privilege.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message