hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Dimiduk <ndimi...@gmail.com>
Subject Re: 1 table, 1 dense CF => N tables, 1 dense CF ?
Date Thu, 08 Jan 2015 01:21:24 GMT
Not to dig too deep into ancient history, but Tsuna's comments are mostly
still relevant today, except for...

You also generally end up with fewer, bigger regions, which is almost
> always better.  This entails that your RS are writing more data to fewer
> WALs, which leads to more sequential writes across the board.  You'll end
> up with fewer HLogs, which is also a good thing.


HBase is one WAL per region server and has been for as long as I've paid
attention. Unless I've missed something, number of tables doesn't change
this fixed number.

If you use HBase's client (which is most likely the case as the only other
> alternative is asynchbase), beware that you need to create one HTable
> instance per table per thread in your application code.


You can still write your client application this way, but the preferred
idiom is to use a single Connection instance from which all these resources
are shared across HTable instances. This pattern is reinforced in the new
client API introduced in 1.0

FYI, I think you can write a Compaction coprocessor that implements your
data expiration policy through normal compaction operations, thereby
removing the necessity of the (expensive?) scan + write delete pattern
entirely.

-n

On Wed, Jan 7, 2015 at 9:27 AM, Otis Gospodnetic <otis.gospodnetic@gmail.com
> wrote:

> Hi,
>
> It's been asked before, but I didn't find any *definite* answers and a lot
> of answers I found via  are from a whiiiile back.
>
> e.g. Tsuna provided pretty convincing info here:
>
> http://search-hadoop.com/m/xAiiO8ttU2/%2522%2522I+generally+recommend+to+stick+to+a+single+table%2522&subj=Re+One+table+or+multiple+tables+
>
> ... but that is from 3 years ago.  Maybe things changed?
>
> Here's our use case:
>
> Data/table layout:
> * HBase is used for storing metrics at different granularities (1min, 5
> min.... - a total of 6 different granularities)
> * It's a multi-tenant system
> * Keys are carefully crafted and include userId + number, where this number
> contains the time and the granularity
> * Everything's in 1 table and 1 CF
>
> Access:
> * We only access 1 system at a time, for a specific time range, and
> specific granularity
> * We periodically scan ALL data and delete data older than N days, where N
> varies from user to user
> * We periodically scan ALL data and merge multiple rows (of the same
> granularity) into 1
>
> Question:
> Would there be any advantage in having 6 tables - one for each granularity
> - instead of having everything in 1 table?
> Assume each table would still have just 1 CF and the keys would remain the
> same.
>
> Thanks,
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message