hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benoit Sigoure (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2959) Scanning always starts at the beginning of a row
Date Wed, 08 Sep 2010 08:45:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907133#action_12907133
] 

Benoit Sigoure commented on HBASE-2959:
---------------------------------------

Yes this issue is a pretty major performance regression for us (although the way we use HBase
to hit this regression is somewhat questionable - the schema is way suboptimal for HBase).

Jonathan, I'm missing some context about "delete family".  Ryan and Stack mentioned some of
it to me but I still don't understand why it would be so expensive to store delete markers
for each and every KeyValue you delete.  Putting such markers before actual data will always
make it harder for HBase to honor them since HBase has to "seek back" to see them (or seek
early and then seek forward to find the actual data provided that there was no delete marker,
which is the problem we're running into here), and HBase isn't very good at "seeking back".

> Scanning always starts at the beginning of a row
> ------------------------------------------------
>
>                 Key: HBASE-2959
>                 URL: https://issues.apache.org/jira/browse/HBASE-2959
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.4, 0.20.5, 0.20.6, 0.89.20100621
>            Reporter: Benoit Sigoure
>            Priority: Blocker
>
> In HBASE-2248, the code in {{HRegion#get}} was changed like so:
> {code}
> -  private void get(final Store store, final Get get,
> -    final NavigableSet<byte []> qualifiers, List<KeyValue> result)
> -  throws IOException {
> -    store.get(get, qualifiers, result);
> +  /*
> +   * Do a get based on the get parameter.
> +   */
> +  private List<KeyValue> get(final Get get) throws IOException {
> +    Scan scan = new Scan(get);
> +
> +    List<KeyValue> results = new ArrayList<KeyValue>();
> +
> +    InternalScanner scanner = null;
> +    try {
> +      scanner = getScanner(scan);
> +      scanner.next(results);
> +    } finally {
> +      if (scanner != null)
> +        scanner.close();
> +    }
> +    return results;
>    }
> {code}
> So instead of doing a {{get}} straight on the {{Store}}, we now open a scanner.  The
problem is that we eventually end up in {{ScanQueryMatcher}} where the constructor does: {{this.startKey
= KeyValue.createFirstOnRow(scan.getStartRow());}}.  This entails that if we have a very wide
row (thousands of columns), the scanner will need to go through thousands of {{KeyValue}}'s
before finding the right entry, because it always starts from the beginning of the row, whereas
before it was much more straightforward.
> This problem was under the radar for a while because the overhead isn't too unreasonable,
but later on, {{incrementColumnValue}} was changed to do a {{get}} under the hood.  At StumbleUpon
we do thousands of ICV per second, so thousand of times per second we're scanning some really
wide rows.  When a row is contented, this results in all the IPC threads being stuck on acquiring
a row lock, while one thread is doing the ICV (albeit slowly due to the excessive scanning).
 When all IPC threads are stuck, the region server is unable to serve more requests.
> As a nice side effect, fixing this bug will make {{get}} and {{incrementColumnValue}}
faster, as well as the first call to {{next}} on a scanner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message