hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: BucketCache Configuration Problem
Date Wed, 04 Mar 2015 05:17:48 GMT
On Tue, Mar 3, 2015 at 6:26 PM, donhoff_h <165612158@qq.com> wrote:

> Hi, Stack
>
> Yes, what I mean is that the working set will not fit in RAM and so we
> consider using SSD. As to the access pattern, no, they are not accessed
> randomly. Usually in a loan business, after the pictures are loaded, some
> users will first read these pictures to check the correctness of the
> information and if they fullfil the requirements of regulation. And then,
> after that step, some other users will read the same pictures to verify the
> risk. So they are not accessed randomly.
>
>
Ok. Sounds like cache can help.  You can cache more if an SSD in the mix.

Yours,
St.Ack



> And meanwhile, thanks very much for submit the port problem for HBase1.0.
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Stack";<stack@duboce.net>;
> 发送时间: 2015年3月4日(星期三) 上午6:48
> 收件人: "Hbase-User"<user@hbase.apache.org>;
>
> 主题: Re: BucketCache Configuration Problem
>
>
>
> On Tue, Mar 3, 2015 at 1:02 AM, donhoff_h <165612158@qq.com> wrote:
>
> > Hi, Stack
> >
> > Still thanks much for your quick reply.
> >
> > The reason that we don't shrink the heap and allocate the savings to the
> > offheap is that we want to cache datablocks as many as possible. The
> memory
> > size is limited. No matter how much we shrink it can not store so many
> > datablocks. So we want to try "FILE" instead of "offheap". And yes, in
> this
> > situation we are considering using SSD.
> >
> >
> You are saying that your working set will not fit in RAM or a good portion
> of your working set?  Are the accesses totally random?  If so, would it be
> better not to cache at all (don't pay the price of cache churn if it is not
> doing your serving any good).
>
>
>
> > As to my configuration, I have attached them in this mail.  Thanks very
> > for trying my config.
> >
> >
> Ok. Will try.
>
>
>
> > I did not read the post you recommended. I will read it carefully.
> Perhaps
> > I can make my decision according to this post.
> >
> >
> It tries various configs and tries to do +/-. Hopefully will help
> (references another similar study that may help also).
>
>
>
> > By the way, I also asked my colleagues to try HBase1.0. But we found that
> > we could not start the master and regionserver on the same node. (Because
> > we are making tests, we deploy HBASE on a very small cluster. We hope the
> > nodes that run master and backup-master also run the regionserver).  I
> read
> > the ref-guide again. And it seems that the master and regionserver use
> the
> > same port 16020. Is that mean that there is no way that I can deploy the
> > master and regionserver on the same node and start them with a simple
> > "start-hbase.sh" command?
> >
> >
> I see.  It looks like explicity master port has been purged.  I think that
> is a bug (HBASE-13148). You can set the http/info port but where master
> listens for requests is hbase.regionserver.port.  You probably have to
> start the master first passing a system property for ports to listen on --
> -Dhbase.regionserver.port=XXX should work -- and then start the
> regionservers.
>
> St.Ack
>
>
>
>
> > Thanks!
> >
> > ------------------ 原始邮件 ------------------
> > *发件人:* "Stack";<stack@duboce.net>;
> > *发送时间:* 2015年3月3日(星期二) 中午1:30
> > *收件人:* "Hbase-User"<user@hbase.apache.org>;
> > *主题:* Re: BucketCache Configuration Problem
> >
> > On Mon, Mar 2, 2015 at 6:54 PM, donhoff_h <165612158@qq.com> wrote:
> >
> > > Hi, Stack
> > >
> > > Thanks very much for your quick reply.
> > >
> > > It's OK to tell you my app sinario. I am not sure if it is very
> extreme.
> > :)
> > >
> > > Recently we consider using HBase to store all the pictures of our bank.
> > > For example , the pictures from loan apply & contract, credit card
> apply,
> > > draft etc. Usually the total count of pictures that the business users
> > will
> > > access daily is not very high. But since each picture takes a lot of
> > space,
> > > the total amount of pictures that business users will access daily is
> > very
> > > high. So we considing using cache to improve the performance.
> > >
> > >
> > Pictures are compressed, right?
> >
> >
> >
> > > Since the total space daily accessed is very large and the memory may
> not
> > > contain so much space, we consider use "FILE" instead of "offheap" for
> > the
> > > BucketCache.
> >
> >
> > Ok. Is your FILE located on an SSD? If not, FILE is probably a suboptimal
> > option.  Only reason I could see you favoring FILE is if you want your
> > cache to be 'warm' on startup (the FILE can be persisted and reopened
> > across restarts)
> >
> >
> >
> > > At this situation if we use CBC policy, it will lead to that the memory
> > > will only cache the meta blocks and leave the datablocks stored in
> > "FILE'.
> > > And it seems the memory is not taking full usage, because the total
> count
> > > of pictures daily accessed is not very high and we may not need to
> cache
> > > many meta blocks.
> >
> >
> > Right. Onheap, we'll only have the indices and blooms. Offheap or in
> FILE,
> > we'll have the data blocks.
> >
> > The issue is that you believe your java heap is going to waste? If so,
> > shrink the JVM heap -- less memory to GC -- and allocate the savings to
> the
> > offheap (or to the os and let it worry about caching).
> >
> >
> >
> > > So we want to know if the RAW L1+L2 is better. Maybe it can take full
> use
> > > of memory and meanwhile cache a lot of datablocks. That's the reason
> why
> > I
> > > tried to setup a RAW L1+L2 configuration.
> > >
> > >
> > You've seen this post I presume:
> > https://blogs.apache.org/hbase/entry/comparing_blockcache_deploys
> >
> > RAW L1+L2 is not tested. Blocks are first cached in L1 and if evicted,
> they
> > go to L2. Going first into the java heap (L1) and then out to L2 could
> make
> > for some ugly churn if blocks are being flipped frequently between the
> two
> > tiers. We used to have a "SlabCache" option and it had a similar policy;
> > all it seemed to do in testing was run slow and generate GC garbage so we
> > removed it (and passed on testing L1+L2 RAW)
> >
> > High-level, it sounds like you cannot cache the dataset in memory and
> that
> > you will have some cache churn over the day; in this case, CBC and a
> > shrunken java heap with the savings given over to offheap or the os cache
> > would seem to be the way to go?  Your fetches will be slower than if you
> > could cache it all onheap but you should have a nice GC profile and
> fairly
> > predicatable latency.
> >
> >
> >
> > > By the way, I tried your advice to set
> > > hbase.bucketcache.combinedcache.enabled=fales. But the WebUI of the
> > > regionserver said that I did not deploy L2 cache. You can see it in the
> > > attachement. Is that still caused by my HBase Version?  Is RAW L1+L2
> only
> > > applicable in HBase1.0?
> > >
> > > Send me your config and I'll try it here.
> >
> >
> >
> > > At last, you said that the refguide still needs to be maintained. I
> > > totally understand.  :) It is the same in my bank. We also have a lot
> of
> > > docs that need to be updated. And I shall be very glad if my questions
> > can
> > > help you to find those locations and can help others.
> > >
> > >
> > Smile. Thanks for being understanding.
> > St.Ack
> >
> >
> >
> > > Thanks again!‍
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > *发件人:* "Stack";<stack@duboce.net>;
> > > *发送时间:* 2015年3月3日(星期二) 凌晨0:13
> > > *收件人:* "Hbase-User"<user@hbase.apache.org>;
> > > *主题:* Re: BucketCache Configuration Problem
> >
> > >
> > > On Mon, Mar 2, 2015 at 6:19 AM, donhoff_h <165612158@qq.com> wrote:
> > >
> > > > Hi, Stack
> > > >
> > > > Thanks for your reply and also thanks very much for your reply to my
> > > > previous mail "Questions about BucketCache".
> > > >
> > > > The hbase.bucketcache.bucket.sizes property takes effect. I did not
> > > notice
> > > > that in your first mail you have told me the name should be
> > > > hbase.bucketcache.bucket.sizes, instead of hbase.bucketcache.sizes. I
> > > > noticed this point until your last mail. It's my fault. Thanks for
> your
> > > > patient.
> > > >
> > > >
> > > No harm. Thanks for your patience and writing the list.
> > >
> > >
> > >
> > > > As to the relationship between HBASE_OFFHEAPSIZE and
> > > > -XX:MaxDirectMemorySize, I followed your advice to find in the
> > bin/hbase
> > > > the statements that contain HBASE_OFFHEAPSIZE, but I found that there
> > > isn't
> > > > any statement which contains HBASE_OFFHEAPSIZE. I also tried
> > > > "bin/hbase-daemon.sh" and "bin/hbase-config.sh", they don't contain
> > > > HBASE_OFFHEAPSIZE either. So I still don't know their relationship.
> My
> > > > HBase version is 0.98.10. Is HBASE_OFFHEAPSIZE not used in this
> > version ?
> > > >
> > > >
> > > This is my fault. The above applies to versions beyond 0.98, not your
> > > version. Please pass MaxDirectMemorySize inside HBASE_OPTS.
> > >
> > >
> > >
> > >
> > > > As to "the pure secondary cache" or "bypass CBC", I mean use
> > BucketCache
> > > > as a strict L2 cache to the L1 LruBlockCache, ie the Raw L1+L2. The
> > point
> > > > comes from the reference guide of Apache HBase, which says : "It is
> > > > possible to deploy an L1+L2 setup where we bypass the
> > CombinedBlockCache
> > > > policy and have BucketCache working as a strict L2 cache to the L1
> > > > LruBlockCache. For such a setup, set
> > > CacheConfig.BUCKET_CACHE_COMBINED_KEY
> > > > to false. In this mode, on eviction from L1, blocks go to L2. When a
> > > block
> > > > is cached, it is cached first in L1. When we go to look for a cached
> > > block,
> > > > we look first in L1 and if none found, then search L2. Let us call
> this
> > > > deploy format, Raw L1+L2."
> > > > I want to configure this kind of cache not because the CBC poliy is
> not
> > > > good, but because I am a tech-leader in a bank. I need to compare
> these
> > > two
> > > > kinds of cache to make a decision for our differenct kinds of apps.
> The
> > > > reference guide said I can configure in
> > > > "CacheConfig.BUCKET_CACHE_COMBINED_KEY‍", but is there anyway I can
> > > > configure in hbase-site.xml?
> > > >
> > > >
> > > I see.
> > >
> > > BUCKET_CACHE_COMBINED_KEY == hbase.bucketcache.combinedcache.enabled.
> > Set
> > > it to false in your hbase-site.xml.
> > >
> > > But lets backup and let me save you some work. Such an 'option' should
> be
> > > struck from the refguide as an exotic permutation that only in the most
> > > extreme of cases would perform better than CBC. Do you have a loading
> > where
> > > you think this combination would be better suited? If so, would you
> mind
> > > telling us of it?
> > >
> > > Meantime, we need to work on posting different versions of our doc so
> > > others don't have the difficult time you have had above. We've been
> lazy
> > up
> > > to this because the doc was generally applicable but bucketcache is one
> > of
> > > the locations where version matters.
> > >
> > > Yours,
> > > St.Ack
> > >
> > >
> > >
> > >
> > >
> > >
> > > > Many Thanks!‍
> > > >
> > > >
> > > >
> > > >
> > > > ------------------ 原始邮件 ------------------
> > > > 发件人: "Stack";<stack@duboce.net>;
> > > > 发送时间: 2015年3月2日(星期一) 下午2:30
> > > > 收件人: "Hbase-User"<user@hbase.apache.org>;
> > > >
> > > > 主题: Re: BucketCache Configuration Problem
> > > >
> > > >
> > > >
> > > > On Sun, Mar 1, 2015 at 12:57 AM, donhoff_h <165612158@qq.com> wrote:
> > > >
> > > > > Hi, experts
> > > > >
> > > > > I am using HBase0.98.10 and have 3 problems about BucketCache
> > > > > configuration.
> > > > >
> > > > > First:
> > > > > I  read the reference guide of Apache HBase to learn how to config
> > > > > BucketCache. I find that when using offheap BucketCache, the
> > reference
> > > > > guide says that I should config the HBASE_OFFHEAPSIZE , it also
> says
> > > > that
> > > > > I should config -XX:MaxDirectMemorySize. Since these two parameters
> > > are
> > > > > both related to the DirectMemory, I am confused which one should
I
> > > > > configure?
> > > > >
> > > > >
> > > > See bin/hbase how HBASE_OFFHEAPSIZE gets interpolated as value of the
> > > > -XX:MaxDirectMemorySize passed to java (so set HBASE_OFFHEAPSIZE)
> (will
> > > fix
> > > > the doc so more clear)
> > > >
> > > >
> > > >
> > > > > Second:
> > > > > I want to know how to configure the  BucketCache as a pure
> secondary
> > > > > cache, which I mean to bypass the  CombinedBlockCache policy. I
> tried
> > > to
> > > > > configure as following , but when I  go to the regionserver's
> webUI,
> > I
> > > > > found it says "No L2 deployed"
> > > > >
> > > > > hbase.bucketcache.ioengine=offheap
> > > > > hbase.bucketcache.size=200
> > > > > hbase.bucketcache.combinedcache.enabled=false
> > > > >
> > > > >
> > > > What do you mean by pure secondary cache? Which block types do you
> want
> > > in
> > > > the bucketcache?
> > > >
> > > > Why bypass CBC? We've been trying to simplify bucketcache deploy.
> Part
> > of
> > > > this streamlining has been removing the myriad options because they
> > tend
> > > to
> > > > confuse and give user a few simple choices. Do the options not work
> for
> > > > you?
> > > >
> > > >
> > > >
> > > > > Third:
> > > > > I  made the following configuration to set the Bucket Sizes. But
> from
> > > > > regionserver's WebUI, I found that (4+1)K and (8+1)K sizes are
> used,
> > > > > (64+1)K sizes are not used. What's wrong with my configuration?
> > > > > hbase.bucketcache.ioengine=offheap
> > > > > hbase.bucketcache.size=200
> > > > > hbase.bucketcache.combinedcache.enabled=true
> > > > > hbase.bucketcache.sizes=65536  hfile.block.cache.sizes=65536 I
> > > configured
> > > > > these two both for I don't  know which one is in use now.
> > > > >
> > > > >
> > > > As per previous mail (and HBASE-13125), hfile.block.cache.sizes has
> no
> > > > effect in 0.98.x.  Also per our previous mail "Questions about
> > > > BucketCache", isn't it hbase.bucketcache.bucket.sizes that you want?
> > > >
> > > > You have tried the defaults and they do not fit your access patten?
> > > >
> > > > Yours,
> > > > St.Ack
> > > >
> > > >
> > > > > Many Thanks!‍
> > > >
> > >
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message