hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gautam <gautamkows...@gmail.com>
Subject Hbase Scan/Snapshot Performance...
Date Tue, 12 Aug 2014 21:58:30 GMT

     We'v been using and loving Hbase for couple of months now. Our primary
usecase for Hbase is writing events in stream to an online time series
Hbase table. Every so often we run medium to large batch scan MR jobs on
sections (1hour, 1 day, 1 week)  of this same time series table. This
online table is now showing spikes whenever these large batched read jobs
are run. Write throughput goes down while these sequential scans are
running on the table.

We'v been playing around with snapshots and are considering using snapshots
to take over the responsibility for running these scheduled hourly, daily,
weekly jobs so that the online table doesn't get affected. From preliminary
tests it looks like online snapshots take waay too long. The snapshot job
times out after 60secs. The time was spent flushing the memstores on all
region servers (as expected) which seems to take too long.  Also it seems
from the RS logs like this is done serially.

Offline snapshots isn't an option since we can't disable this table which
serves the event writing.

*We'r running Hbase 94.6. Tried benchmarking snapshotting on a 9TB Table
with 240 regions, 1 Column Family, 4 region servers. *

All in all, I'd like to ask if things would improve if we upgraded to Hbase
0.98.+ Are there known benchmark numbers on expected snapshot performance
for 94.+ vs. 98.+ ?  In an ideal scenario we'd like these MR jobs to
dynamically take a snapshot, run the job, delete/re-use the snapshot based
on freshness. At the least, we need the snapshot to be fresh until the last

Also from what I understand in Hbase, scans are not consistent at the table
level but are at the row level. Are there other ways I can query the online
table without hurting the write throughput?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message