hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2450) For single row reads of specific columns, seek to the first column in HFiles rather than start of row
Date Fri, 03 Sep 2010 18:04:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905993#action_12905993

stack commented on HBASE-2450:

So, we just had an interesting case here where an ICV was running real slow -- two orders
of magnitude slower than old Get 0.20.x codepath, see hbase-2959 -- because the ICV was being
done on a row that had thousands of columns (The ICV to update was somewhere in the midst
of these thousands of columns).   At first blush, the fix was changing ScanQueryMatcher so
that the startrow was changed from '    this.startKey = KeyValue.createFirstOnRow(scan.getStartRow());'
to instead consider column.  But then, reading this issue, I'm reminded of deletes and of
how a delete row is first thing on the row and of how a delete family is first thing in a

Having to go to the start of the row and move forward is slowing Gets (and ICVs).

Above its mentioned that get on a row needs to look at start of row to see if a delete of
all the row (and we need to look at start of family to see if family is deleted) but, yeah,
this seems wrong.

The other ideas sound better -- delete dynamic bloom or extra info in index.

Meantime we've changed our schema here so ICVs done in a row of one column only but this issue
is going to burn us again.

> For single row reads of specific columns, seek to the first column in HFiles rather than
start of row
> -----------------------------------------------------------------------------------------------------
>                 Key: HBASE-2450
>                 URL: https://issues.apache.org/jira/browse/HBASE-2450
>             Project: HBase
>          Issue Type: Improvement
>          Components: io, regionserver
>            Reporter: Jonathan Gray
>            Assignee: Pranav Khaitan
>             Fix For: 0.90.0
> Currently we will always seek to the start of a row.  If we are getting specific columns,
we should seek to the first column in that row.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message