hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Bigdatafun <sean.bigdata...@gmail.com>
Subject Re: Limits on HBase
Date Fri, 15 Oct 2010 00:23:22 GMT
Let me ask this question from another angle:

The first question is ---
if I have millions of column in a column family in the same row, such that
the sum of the key-value pairs exceeds 256MB, what will happen?

example:
I have a column with key of 256bytes, and the value of 2K, then let's assume
(256 + timestampe size + 2056) ~=2.5k,
then I understand I can at most story 256 * 1024 / 2.5 = 104,875 columns in
this column family at this row.

Anyone has comments on the math I gave above?


The second question is --
By the way, if I do not turn on the LZO, is my data also compressed (by the
system)? -- if so, then the above number will increase a couple of times,
but still there exists a number for the limit of how many columns I can put
in a row.

The third question is --
If I do turn on LZO, does that mean the value get compressed first, and then
the HBase mechanism further compress the key-value pair?

Thanks,
Sean


On Tue, Sep 7, 2010 at 8:30 PM, Jonathan Gray <jgray@facebook.com> wrote:

> You can go way beyond the max region split / split size.  HBase will never
> split the region once it is a single row, even if beyond the split size.
>
> Also, if you're using large values, you should have region sizes much
> larger than the default.  It's common to run with 1-2GB regions in many
> cases.
>
> What you may have seen are recommendations that if your cell values are
> approaching the default block size on HDFS (64MB), you should consider
> putting the data directly into HDFS rather than HBase.
>
> JG
>
> > -----Original Message-----
> > From: William Kang [mailto:weliam.cloud@gmail.com]
>  > Sent: Tuesday, September 07, 2010 7:36 PM
> > To: user@hbase.apache.org; apurtell@apache.org
> > Subject: Re: Limits on HBase
> >
> > Hi,
> > Thanks for your reply. How about the row size? I read that a row should
> > not
> > be larger than the hdfs file on region server which is 256M in default.
> > Is
> > it right? Many thanks.
> >
> >
> > William
> >
> > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <apurtell@apache.org>
> > wrote:
> >
> > > In addition to what Jon said please be aware that if compression is
> > > specified in the table schema, it happens at the store file level --
> > > compression happens after write I/O, before read I/O, so if you
> > transmit a
> > > 100MB object that compresses to 30MB, the performance impact is that
> > of
> > > 100MB, not 30MB.
> > >
> > > I also try not to go above 50MB as largest cell size, for the same
> > reason.
> > > I have tried storing objects larger than 100MB but this can cause out
> > of
> > > memory issues on busy regionservers no matter the size of the heap.
> > When/if
> > > HBase RPC can send large objects in smaller chunks, this will be less
> > of an
> > > issue.
> > >
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Why is this email five sentences or less?
> > > http://five.sentenc.es/
> > >
> > >
> > > --- On Mon, 9/6/10, Jonathan Gray <jgray@facebook.com> wrote:
> > >
> > > > From: Jonathan Gray <jgray@facebook.com>
> > > > Subject: RE: Limits on HBase
> > > > To: "user@hbase.apache.org" <user@hbase.apache.org>
> > > > Date: Monday, September 6, 2010, 4:10 PM
> > > > I'm not sure what you mean by
> > > > "optimized cell size" or whether you're just asking about
> > > > practical limits?
> > > >
> > > > HBase is generally used with cells in the range of tens of
> > > > bytes to hundreds of kilobytes.  However, I have used
> > > > it with cells that are several megabytes, up to about
> > > > 50MB.  Up at that level, I have seen some weird
> > > > performance issues.
> > > >
> > > > The most important thing is to be sure to tweak all of your
> > > > settings.  If you have 20MB cells, you need to be sure
> > > > to increase the flush size beyond 64MB and the split size
> > > > beyond 256MB.  You also need enough memory to support
> > > > all this large object allocation.
> > > >
> > > > And of course, test test test.  That's the easiest way
> > > > to see if what you want to do will work :)
> > > >
> > > > When you run into problems, e-mail the list.
> > > >
> > > > As far as row size is concerned, the only issue is that a
> > > > row can never span multiple regions so a given row can only
> > > > be in one region and thus be hosted on one server at a
> > > > time.
> > > >
> > > > JG
> > > >
> > > > > -----Original Message-----
> > > > > From: William Kang [mailto:weliam.cloud@gmail.com]
> > > > > Sent: Monday, September 06, 2010 1:57 PM
> > > > > To: hbase-user
> > > > > Subject: Limits on HBase
> > > > >
> > > > > Hi folks,
> > > > > I know this question may have been asked many times,
> > > > but I am wondering
> > > > > if
> > > > > there is any update on the optimized cell size (in
> > > > megabytes) and row
> > > > > size
> > > > > (in megabytes)? Many thanks.
> > > > >
> > > > >
> > > > > William
> > > >
> > >
> > >
> > >
> > >
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message