hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: TTL performance
Date Thu, 21 Jun 2012 21:03:24 GMT
> 2012/6/21, Frédéric Fondement <frederic.fondement@uha.fr>:
> opt3. looks the nicest (only 3-4 tables to scan when reading), but won't my daily major
compact become crazy ?

If you want more control over the major compaction process, for
example to lessen the load on your production cluster to a constant
background level, the HBase shell is the JRuby irb so you have the
full power of the HBase API and Ruby, in the worst case you can write
a shell script that gets a list of regions and triggers major
compaction on each region separately or according to whatever policy
you construct. The script invocation can happen manually or out of
crontab.

Another performance consideration is how many expired cells might have
to be skipped by a scan. If you have a wide area of the keyspace that
is all expired at once, then the scan will seem to "pause" while
traversing this area. However, you can use setTimeRange to bound your
scan by time range and then HBase can optimize whole HFiles away just
by examining their metadata. Therefore I would recommend using both
TTLs for automatic background garbage collection of expired entries,
as well as time range bounded scans for read time optimization.

Incidentally, there was an interesting presentation at HBaseCon
recently regarding a creative use of timestamps:
http://www.slideshare.net/cloudera/1-serving-apparel-catalog-from-h-base-suraj-varma-gap-inc-finalupdated-last-minute
(slide 16).

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)

Mime
View raw message