hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Levin <magn...@gmail.com>
Subject Re: all regions unregistered over time.
Date Wed, 22 Sep 2010 18:09:18 GMT
Lzo of image data, which is already Jpeg?  Probably not a great idea, yes?

-Jack

On Wed, Sep 22, 2010 at 11:06 AM, Stack <stack@duboce.net> wrote:
> Are you lzo'ing Jack?  If not, you probably should.
> St.Ack
>
> On Wed, Sep 22, 2010 at 3:17 AM, Jack Levin <magnito@gmail.com> wrote:
>> So our cell sizes will be 350kb on average with 5-10 terabytes per server, I just
want to keep the count of Regions under 1000, per server
>>
>> -Jack
>>
>>
>> On Sep 22, 2010, at 2:44 AM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>>
>>> Region size is one of those tricky things, there are a few factors to consider:
>>>
>>> - regions are the basic element of availability and distribution.
>>> - HBase scales by having regions across many servers.  Thus if you
>>> have 2 regions for 16GB data, on a 20 node machine you are a net loss
>>> there.
>>> - High region count has been known to make things slow, this is
>>> getting better, but it is probably better to have 700 regions than
>>> 3000 for the same amount of data.
>>> - Low region count prevents parallel scalability as per point #2.
>>> This really cant be stressed enough, since a common problem is loading
>>> 200MB data into HBase then wondering why your awesome 10 node cluster
>>> is mostly idle.
>>> - There is not much memory footprint difference between 1 region and
>>> 10 in terms of indexes, etc, held by the regionserver.
>>>
>>> Generally speaking I stick to the default, go smaller for hot tables,
>>> or manually split them, and go with a 1GB region size on our largest
>>> 900 GB table.
>>>
>>> -ryan
>>>
>>> On Wed, Sep 22, 2010 at 12:01 AM, Jack Levin <magnito@gmail.com> wrote:
>>>> Yes, I am thinking to put 10 to 15 million files on each regionserver
>>>> (well, not literally, but be controlled by regionserver).   So thats
>>>> close to 4 TB worth of regions, which is about 4GB per region should
>>>> we target 1000 regions per server.  Note, not all files are 'hot', and
>>>> I only expect to keep about 1% super hot, and 5% relatively hot, the
>>>> rest are cold.  So in terms of keeping hbase blocks in RAM, that
>>>> should be adequate, for the rest we can afford a trip into hdfs.
>>>>
>>>> If servers are running 8 GB of ram, and are shared for regionservers
>>>> and datanodes, how much heap should I allocate to each?  6GB for RS
>>>> and 1GB  for DN?
>>>>
>>>> Also, on the question whether 8 core x 16G Ram helps a Master server
>>>> to bring up the cluster faster, the answer is definitely - yes.   It
>>>> took only 90 seconds to load 5000 regions across 13 servers, where
>>>> same task for Dual Core 8G Ram, took nearly 10 minutes.
>>>>
>>>> -Jack
>>>>
>>>>
>>>>
>>>> On Tue, Sep 21, 2010 at 11:38 PM, Stack <stack@duboce.net> wrote:
>>>>> On Tue, Sep 21, 2010 at 11:11 PM, Jack Levin <magnito@gmail.com>
wrote:
>>>>>> Its definitely binary, and I can even load it in my browser but
>>>>>> setting appropriate headers.  So I guess for PUT and GET via Accept:
>>>>>> application/octet-stream there is no base64 encoding at all.
>>>>>>
>>>>>
>>>>> OK.  Good.  If it were base64'd, you'd see it.
>>>>>
>>>>>> Btw, out of curiosity I have region max file size set to 1GB now,
but
>>>>>> what if I set it to say 10G or 50G?  Is their significant overhead
in
>>>>>> address seeking via HDFS?
>>>>>>
>>>>>
>>>>> You could do that.  We don't have much experience running regions of
>>>>> that size.  You should for sure pre-split your table on creation if
>>>>> you go this route (See HBaseAdmin API [1]).  This method is not
>>>>> available in shell so you'd have to script it or write a little java
>>>>> to do it).
>>>>>
>>>>> St.Ack
>>>>>
>>>>> 1. http://hbase.apache.org/docs/r0.89.20100726/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
>>>>> byte[][])
>>>>>
>>>>
>>
>

Mime
View raw message