hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raymond Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8001) Avoid unnecessary lazy seek
Date Mon, 03 Jun 2013 02:49:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672771#comment-13672771

Raymond Liu commented on HBASE-8001:

[~lhofhansl], [~ted_yu]: sorry for late on this issue. busy for other staff. I do this test
again in with single RS. I still use M/R job, while the table only have one region. and is
2M rows ,1CF, 18col, without any compression or encoding. size about 3G on disk. And I don't
use blockcache,every time the data is read from disk by a real seek. but as we discussed before,
use of blockcache will only led to more gain with this patch.

with this patch, a 18col full table scan cost 99-101s, while without this patch it will cost
108-109s. still noticeable difference. I test it for several times on each case. the result
is pretty stable.

Do you mind to take a end2end test? I am not sure is there any other thing might still have
impact upon your test case. might be that the data size is too small? 
> Avoid unnecessary lazy seek
> ---------------------------
>                 Key: HBASE-8001
>                 URL: https://issues.apache.org/jira/browse/HBASE-8001
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.5
>            Reporter: Raymond Liu
>            Assignee: Raymond Liu
>             Fix For: 0.98.0
>         Attachments: HBASE-8001_onescanner.patch, HBASE-8001_onescanner_v2.patch
> Lazy seek helps to reduce the real seek needed for multi hfile, when the kv from newer
hfile is enough to satisfy the query.
> While in many case, it just push the real seek later, and do not reduce the number of
real seek. e.g. there are only one hfile, or storefilescanner is closed and only one left,
or the scan need to go through all the versions, or there are only one version of row and
a sequence scan is performed. In these case, lazy seek just bring extra overhead.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message