hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-2180) read performance from synchronizing hfile.fddatainputstream
Date Fri, 05 Feb 2010 00:30:27 GMT

     [ https://issues.apache.org/jira/browse/HBASE-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-2180:
-------------------------

    Attachment: 2180.patch

This patch has gets do preads fetching blocks and uses the old seek+read for scans.

Patch removes the old HFile.Reader.getScanner methods and replaces both with a getScanner
that takes two arguments -- whether to cache blocks read and whether to use pread or not pulling
in the block.  I got rid of the old getScanners to force all getScanners to be explicit about
what they want regards caching and pread.

This patch does not include tests.  Its hard to test for this performance change.

A further improvement would recognize short scans -- i.e. scans that are < an hfile block
size.  In this case, we'd want to pread rather than seek+scan (especially so when scan one
row replaces get)



> read performance from synchronizing hfile.fddatainputstream
> -----------------------------------------------------------
>
>                 Key: HBASE-2180
>                 URL: https://issues.apache.org/jira/browse/HBASE-2180
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: 2180.patch
>
>
> deep in the HFile read path, there is this code:
>     synchronized (in) {
>       in.seek(pos);
>       ret = in.read(b, off, n);
>     }
> this makes it so that only 1 read per file per thread is active. this prevents the OS
and hardware from being able to do IO scheduling by optimizing lots of concurrent reads. 
> We need to either use a reentrant API (pread may be partially reentrant according to
Todd) or use multiple stream objects, 1 per scanner/thread.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message