hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: 1 table, 1 dense CF => N tables, 1 dense CF ?
Date Thu, 08 Jan 2015 01:39:18 GMT
w.r.t. one WAL per region server, see HBASE-5699 'Run with > 1 WAL in
HRegionServer' which is in the upcoming 1.0.0 release.

Cheers

On Wed, Jan 7, 2015 at 5:21 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:

> Not to dig too deep into ancient history, but Tsuna's comments are mostly
> still relevant today, except for...
>
> You also generally end up with fewer, bigger regions, which is almost
> > always better.  This entails that your RS are writing more data to fewer
> > WALs, which leads to more sequential writes across the board.  You'll end
> > up with fewer HLogs, which is also a good thing.
>
>
> HBase is one WAL per region server and has been for as long as I've paid
> attention. Unless I've missed something, number of tables doesn't change
> this fixed number.
>
> If you use HBase's client (which is most likely the case as the only other
> > alternative is asynchbase), beware that you need to create one HTable
> > instance per table per thread in your application code.
>
>
> You can still write your client application this way, but the preferred
> idiom is to use a single Connection instance from which all these resources
> are shared across HTable instances. This pattern is reinforced in the new
> client API introduced in 1.0
>
> FYI, I think you can write a Compaction coprocessor that implements your
> data expiration policy through normal compaction operations, thereby
> removing the necessity of the (expensive?) scan + write delete pattern
> entirely.
>
> -n
>
> On Wed, Jan 7, 2015 at 9:27 AM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com
> > wrote:
>
> > Hi,
> >
> > It's been asked before, but I didn't find any *definite* answers and a
> lot
> > of answers I found via  are from a whiiiile back.
> >
> > e.g. Tsuna provided pretty convincing info here:
> >
> >
> http://search-hadoop.com/m/xAiiO8ttU2/%2522%2522I+generally+recommend+to+stick+to+a+single+table%2522&subj=Re+One+table+or+multiple+tables+
> >
> > ... but that is from 3 years ago.  Maybe things changed?
> >
> > Here's our use case:
> >
> > Data/table layout:
> > * HBase is used for storing metrics at different granularities (1min, 5
> > min.... - a total of 6 different granularities)
> > * It's a multi-tenant system
> > * Keys are carefully crafted and include userId + number, where this
> number
> > contains the time and the granularity
> > * Everything's in 1 table and 1 CF
> >
> > Access:
> > * We only access 1 system at a time, for a specific time range, and
> > specific granularity
> > * We periodically scan ALL data and delete data older than N days, where
> N
> > varies from user to user
> > * We periodically scan ALL data and merge multiple rows (of the same
> > granularity) into 1
> >
> > Question:
> > Would there be any advantage in having 6 tables - one for each
> granularity
> > - instead of having everything in 1 table?
> > Assume each table would still have just 1 CF and the keys would remain
> the
> > same.
> >
> > Thanks,
> > Otis
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message