hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1118) Scanner setup takes too long
Date Thu, 22 Jan 2009 22:43:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666329#action_12666329

stack commented on HBASE-1118:

Looking at this a little, the setup of the Scanner is taking up a good portion of the time
returning values.  Profiler shows its taking 30-40% of setup time fetching 100 (small cell)
rows.  To verify the profiler findings, I resorted to system.out and that seemed to show similiar
figures (though maybe its more than 30-40% since my system.out was measuring serverside while
time was taken on client side after rows had been fetched and emitted on console).

Every time we open a scanner, it opens a Reader per covered HStoreFiles.  Opening a Reader
currently means opening the data file and its index plus reading in the index into memory.
 This latter seemed to be taking the bulk of the open time in profiler.

There are a few things we can do here but probably not till tfile time.

1. We already have an open Reader for every HStoreFile.  Scanners should be able to access
already-opened Reader indices rather than read in its own.  Will save on startup time and
on heap (Indexes are private in current MapFile).
2. A smarter blockcache would let Scanners use already loaded blocks.  Chatting with jgray,
since we can give tfile a Stream, the Stream we hand it can be smartened up so it goes to
a blockcache first and if no block, only then to hdfs.

> Scanner setup takes too long
> ----------------------------
>                 Key: HBASE-1118
>                 URL: https://issues.apache.org/jira/browse/HBASE-1118
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
> posix4 and dj_ryan are on about scanner setups take too long.  Use case is fetch of a
100 - 1000 rows at a time.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message