hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matteo Bertozzi <theo.berto...@gmail.com>
Subject Re: Hbase Scan/Snapshot Performance...
Date Tue, 12 Aug 2014 22:11:55 GMT
There is HBASE-10935, included in  0.94.21 where you can specify to skip
the memstore flush and the result will be the online version of an "offline

snapshot 'sourceTable', 'snapshotName', {SKIP_FLUSH => true}

On Tue, Aug 12, 2014 at 10:58 PM, Gautam <gautamkowshik@gmail.com> wrote:

> Hello,
>      We'v been using and loving Hbase for couple of months now. Our primary
> usecase for Hbase is writing events in stream to an online time series
> Hbase table. Every so often we run medium to large batch scan MR jobs on
> sections (1hour, 1 day, 1 week)  of this same time series table. This
> online table is now showing spikes whenever these large batched read jobs
> are run. Write throughput goes down while these sequential scans are
> running on the table.
> We'v been playing around with snapshots and are considering using snapshots
> to take over the responsibility for running these scheduled hourly, daily,
> weekly jobs so that the online table doesn't get affected. From preliminary
> tests it looks like online snapshots take waay too long. The snapshot job
> times out after 60secs. The time was spent flushing the memstores on all
> region servers (as expected) which seems to take too long.  Also it seems
> from the RS logs like this is done serially.
> Offline snapshots isn't an option since we can't disable this table which
> serves the event writing.
> *We'r running Hbase 94.6. Tried benchmarking snapshotting on a 9TB Table
> with 240 regions, 1 Column Family, 4 region servers. *
> All in all, I'd like to ask if things would improve if we upgraded to Hbase
> 0.98.+ Are there known benchmark numbers on expected snapshot performance
> for 94.+ vs. 98.+ ?  In an ideal scenario we'd like these MR jobs to
> dynamically take a snapshot, run the job, delete/re-use the snapshot based
> on freshness. At the least, we need the snapshot to be fresh until the last
> hour.
> Also from what I understand in Hbase, scans are not consistent at the table
> level but are at the row level. Are there other ways I can query the online
> table without hurting the write throughput?
> Cheers,
> -Gautam.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message