hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2139) [hbase] Increase parallelism in region servers
Date Tue, 06 Nov 2007 22:46:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540613

Jim Kellerman commented on HADOOP-2139:


Scans vs Updates:

Currently, scans block updates for an entire region, even if the column(s) being scanned are
completely unrelated to the one(s) being updated.

Ideally, we should be able to continue to accept updates even for columns being scanned. In
this case, the scanner would not see updates that happened after the scan started.

Scans vs Cache Flush:

An open scanner will block a cache flush for a region. The reverse is also true, but cache
flushes are more likely to be faster. The contention is both on the memcache and the HStores.
If, as in scans vs updates, a scanner is given a snapshot of the HStore that is immutable,
 a cache flush could proceed independently of a scan if we snapshotted the memcache and the
HStores for the scan. The scan would get what's current at the time the scanner started, cache
flush could do its thing with out interfering.

Compactions vs...

Compactions contend with almost any other operation (except log rolling) at the HStore level,
and that is when the compaction is complete. If we did compactions at the column level, uncontested
columns could get compacted and when the other HStores became free, could do the compaction
and get out as the interval of contention is so small.

Splits vs...

Unlike other operations, splits really do need to function at the region level.  (because
that is the level of granularity in the META region).

You can't split a region if it is being scanned, updated, is processing get or getFull, or
if it is being compacted.  Interestingly, today a cache flush only contends with a split at
the memcache or hstore level, although it probably should contend at the region level.

Log Rolling vs update, cache flush:

Log rolling blocks updates and vice versa. Similarly for log rolling and cache flushes.

Updates vs cache flushes are mutually exclusive w.r.t. accessing HLog.

> [hbase] Increase parallelism in region servers
> ----------------------------------------------
>                 Key: HADOOP-2139
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2139
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
> There are a number of paths in the region server which block against one another including:
> - log rolling
> - cache flushes
> - region splitting
> - updates
> - scanners
> Investigate which can proceed in parallel and mechanisms for making some operations that
currently do not run in parallel.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message