hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vrodio...@carrieriq.com>
Subject RE: Is HBase is feasible for storing 4-5 MB of data as cell value
Date Tue, 25 Feb 2014 20:15:21 GMT
Usually, it is not advisable to store such a large values in HBase (to avoid excessive IO during
compaction).
Keep them in a separate files in HDFS and store in HBase only references. To overcome inherent
max file number limitation of NN
you can bulk several values into a single file (you will need separate process -M/R job to
garbage collect expired or deleted items).

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Ted Yu [yuzhihong@gmail.com]
Sent: Tuesday, February 25, 2014 12:02 PM
To: user@hbase.apache.org
Subject: Re: Is HBase is feasible for storing 4-5 MB of data as cell value

Minor:
Value 0 also means no cap - see HTable#validatePut()

    if (maxKeyValueSize > 0) {

...

          if (kv.getLength() > maxKeyValueSize) {

            throw new IllegalArgumentException("KeyValue size too large");

          }


On Tue, Feb 25, 2014 at 11:52 AM, Ameya Kanitkar <ameya@groupon.com> wrote:

> The only other thing I'd add is, by default HBase caps size of the data per
> column at 10 MB (I think). You can change that by changing this setting:
>
> hbase.client.keyvalue.maxsize
> in hbase-site.xml
>
> -1 means no cap. You can put other numbers for appropriate cap for your use
> case.
>
> Ameya
>
>
> On Tue, Feb 25, 2014 at 12:12 AM, shashwat shriparv <
> dwivedishashwat@gmail.com> wrote:
>
> > Yes for sure you can use hbase for this, you can have
> > 1. different fields of mail in different column of a column family and
> > attachment as a binary array also in a column.
> > 2. you can keep whole message in columns in hbase and the attachments are
> > large enoug on the hdfs and some reference to it in hbase table.
> > 3. schema you can decide, you can have a matrix how you store values to
> > that you can decide.
> >
> >
> > *Warm Regards_**∞_*
> > * Shashwat Shriparv*
> >  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<
> > http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> > https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> > https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv
> > >[image:
> > http://google.com/+ShashwatShriparv]
> > <http://google.com/+ShashwatShriparv>[image:
> > http://www.youtube.com/user/sShriparv/videos]<
> > http://www.youtube.com/user/sShriparv/videos>[image:
> > http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <
> shriparv@yahoo.com>
> >
> >
> >
> > On Tue, Feb 25, 2014 at 12:55 PM, Upendra Yadav <upendra1024@gmail.com
> > >wrote:
> >
> > > I have to use hbase and have mix type of data
> > >
> > > Some of them have size 1-4K(Mail- Header....) and others
> > > >5MB(Attachments...)
> > >
> > > And also we need only random access: any data
> > >
> > > Is HBase is feasible for storing this type of data
> > >
> > > What will be my schema design -
> > > will have to go with 2 different Table -> 1st one for  1-4K and 2nd for
> > big
> > > file
> > > (because of memstore flush will flush other CF, and huge random access)
> > >
> > > Or there is other way:;
> > >
> > > Thanks
> > >
> >
>

Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.

Mime
View raw message