hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Limits on HBase
Date Wed, 08 Sep 2010 04:36:48 GMT
There are 2 definitions of random access:
1) within a file (hdfs can be less than ideal)
2) randomly getting an entire file (not usually considered random gets)

for the latter, streaming an entire file from HDFS is actually pretty
good.  You can see performances of substantial percentages (think
80%+) of the raw disk perf.  I benched hdfs and got 90MB/sec last year
some time just writing raw files.

-ryan


On Tue, Sep 7, 2010 at 9:07 PM, William Kang <weliam.cloud@gmail.com> wrote:
> Hi,
> What's the performance looks like if we put large cell in HDFS vs local file
> system? Random access to HDFS would be slow, right?
>
>
> William
>
> On Tue, Sep 7, 2010 at 11:30 PM, Jonathan Gray <jgray@facebook.com> wrote:
>
>> You can go way beyond the max region split / split size.  HBase will never
>> split the region once it is a single row, even if beyond the split size.
>>
>> Also, if you're using large values, you should have region sizes much
>> larger than the default.  It's common to run with 1-2GB regions in many
>> cases.
>>
>> What you may have seen are recommendations that if your cell values are
>> approaching the default block size on HDFS (64MB), you should consider
>> putting the data directly into HDFS rather than HBase.
>>
>> JG
>>
>> > -----Original Message-----
>> > From: William Kang [mailto:weliam.cloud@gmail.com]
>> > Sent: Tuesday, September 07, 2010 7:36 PM
>> > To: user@hbase.apache.org; apurtell@apache.org
>> > Subject: Re: Limits on HBase
>> >
>> > Hi,
>> > Thanks for your reply. How about the row size? I read that a row should
>> > not
>> > be larger than the hdfs file on region server which is 256M in default.
>> > Is
>> > it right? Many thanks.
>> >
>> >
>> > William
>> >
>> > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <apurtell@apache.org>
>> > wrote:
>> >
>> > > In addition to what Jon said please be aware that if compression is
>> > > specified in the table schema, it happens at the store file level --
>> > > compression happens after write I/O, before read I/O, so if you
>> > transmit a
>> > > 100MB object that compresses to 30MB, the performance impact is that
>> > of
>> > > 100MB, not 30MB.
>> > >
>> > > I also try not to go above 50MB as largest cell size, for the same
>> > reason.
>> > > I have tried storing objects larger than 100MB but this can cause out
>> > of
>> > > memory issues on busy regionservers no matter the size of the heap.
>> > When/if
>> > > HBase RPC can send large objects in smaller chunks, this will be less
>> > of an
>> > > issue.
>> > >
>> > > Best regards,
>> > >
>> > >    - Andy
>> > >
>> > > Why is this email five sentences or less?
>> > > http://five.sentenc.es/
>> > >
>> > >
>> > > --- On Mon, 9/6/10, Jonathan Gray <jgray@facebook.com> wrote:
>> > >
>> > > > From: Jonathan Gray <jgray@facebook.com>
>> > > > Subject: RE: Limits on HBase
>> > > > To: "user@hbase.apache.org" <user@hbase.apache.org>
>> > > > Date: Monday, September 6, 2010, 4:10 PM
>> > > > I'm not sure what you mean by
>> > > > "optimized cell size" or whether you're just asking about
>> > > > practical limits?
>> > > >
>> > > > HBase is generally used with cells in the range of tens of
>> > > > bytes to hundreds of kilobytes.  However, I have used
>> > > > it with cells that are several megabytes, up to about
>> > > > 50MB.  Up at that level, I have seen some weird
>> > > > performance issues.
>> > > >
>> > > > The most important thing is to be sure to tweak all of your
>> > > > settings.  If you have 20MB cells, you need to be sure
>> > > > to increase the flush size beyond 64MB and the split size
>> > > > beyond 256MB.  You also need enough memory to support
>> > > > all this large object allocation.
>> > > >
>> > > > And of course, test test test.  That's the easiest way
>> > > > to see if what you want to do will work :)
>> > > >
>> > > > When you run into problems, e-mail the list.
>> > > >
>> > > > As far as row size is concerned, the only issue is that a
>> > > > row can never span multiple regions so a given row can only
>> > > > be in one region and thus be hosted on one server at a
>> > > > time.
>> > > >
>> > > > JG
>> > > >
>> > > > > -----Original Message-----
>> > > > > From: William Kang [mailto:weliam.cloud@gmail.com]
>> > > > > Sent: Monday, September 06, 2010 1:57 PM
>> > > > > To: hbase-user
>> > > > > Subject: Limits on HBase
>> > > > >
>> > > > > Hi folks,
>> > > > > I know this question may have been asked many times,
>> > > > but I am wondering
>> > > > > if
>> > > > > there is any update on the optimized cell size (in
>> > > > megabytes) and row
>> > > > > size
>> > > > > (in megabytes)? Many thanks.
>> > > > >
>> > > > >
>> > > > > William
>> > > >
>> > >
>> > >
>> > >
>> > >
>> > >
>>
>

Mime
View raw message