hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HBase for Small Key Value Tables
Date Mon, 29 Aug 2016 14:39:18 GMT
Cycling old bits:
http://search-hadoop.com/m/YGbb3E2a71UVLBK&subj=Re+HBase+Count+Rows+in+Regions+and+Region+Servers

You can use /jmx to inspect regions and find the hotspot.

On Mon, Aug 29, 2016 at 7:29 AM, Manish Maheshwari <myloginid@gmail.com>
wrote:

> Hi Dima,
>
> Thanks for the suggestion. We can load the data in heap, but Hbase makes it
> easier for one to write and another to read. With heap we need to build a
> process to handle both processes and also write to log so as to not lose
> the updates in case of process failure.
>
> Thanks
> Manish
>
> On Aug 29, 2016 2:18 PM, "Dima Spivak" <dspivak@cloudera.com> wrote:
>
> > (Though if it is only 7 GB, why not just store it in memory?)
> >
> > On Sunday, August 28, 2016, Dima Spivak <dspivak@cloudera.com> wrote:
> >
> > > If your data can all fit on one machine, HBase is not the best choice.
> I
> > > think you'd be better off using a simpler solution for small data and
> > leave
> > > HBase for use cases that require proper clusters.
> > >
> > > On Sunday, August 28, 2016, Manish Maheshwari <myloginid@gmail.com
> > > <javascript:_e(%7B%7D,'cvml','myloginid@gmail.com');>> wrote:
> > >
> > >> We dont want to invest into another DB like Dynamo, Cassandra and
> > Already
> > >> are in the Hadoop Stack. Managing another DB would be a pain. Why
> HBase
> > >> over RDMS, is because we call HBase via Spark Streaming to lookup the
> > >> keys.
> > >>
> > >> Manish
> > >>
> > >> On Mon, Aug 29, 2016 at 1:47 PM, Dima Spivak <dspivak@cloudera.com>
> > >> wrote:
> > >>
> > >> > Hey Manish,
> > >> >
> > >> > Just to ask the naive question, why use HBase if the data fits into
> > >> such a
> > >> > small table?
> > >> >
> > >> > On Sunday, August 28, 2016, Manish Maheshwari <myloginid@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Hi,
> > >> > >
> > >> > > We have a scenario where HBase is used like a Key Value Database
> to
> > >> map
> > >> > > Keys to Regions. We have over 5 Million Keys, but the table size
> is
> > >> less
> > >> > > than 7 GB. The read volume is pretty high - About 50x of the
> > >> put/delete
> > >> > > volume. This causes hot spotting on the Data Node and the region
> is
> > >> not
> > >> > > split. We cannot change the maxregionsize parameter as that will
> > >> impact
> > >> > > other tables too.
> > >> > >
> > >> > > Our idea is to manually inspect the row key ranges and then split
> > the
> > >> > > region manually and assign them to different region servers.
We
> will
> > >> > > continue to then monitor the rows in one region to see if needs
to
> > be
> > >> > > split.
> > >> > >
> > >> > > Any experience of doing this on HBase. Is this a recommended
> > approach?
> > >> > >
> > >> > > Thanks,
> > >> > > Manish
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > -Dima
> > >> >
> > >>
> > >
> > >
> > > --
> > > -Dima
> > >
> > >
> >
> > --
> > -Dima
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message