hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-61) [hbase] Create an HBase-specific MapFile implementation
Date Tue, 03 Feb 2009 01:27:59 GMT

     [ https://issues.apache.org/jira/browse/HBASE-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack updated HBASE-61:

    Attachment: hfile.patch

Testing, TFile is a good bit slower than MapFile if cells are ~100bytes or less and you are
doing a random-access. Its slower even if you subsequently read 30 rows at the offset -- even
if we use a tfile block size of 8k.  If cell values are 1k, tfile is faster than MF.

So, after profiling and discussion on IRC, thought is that we need something like a stripped
down tfile or even a new format altogether.  The attached patch is start of my stripping chunking
and key and value streams out of TFile.  Not finished yet.  Intent is to keep most of the
TFile API and the underlying block mechanism with its attendant block finding mechanism as
well as all the metadata facility and index-on-the end but in the guts of tfile, there'd be
the DFSClient FSInput/OutputStream and blocks of byte arrays only.  The stripped down TFile
is now called HFile.

> [hbase] Create an HBase-specific MapFile implementation
> -------------------------------------------------------
>                 Key: HBASE-61
>                 URL: https://issues.apache.org/jira/browse/HBASE-61
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: io
>            Reporter: Bryan Duxbury
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.20.0
>         Attachments: cpucalltreetfile.html, hfile.patch, longestkey.patch, tfile.patch,
> Today, HBase uses the Hadoop MapFile class to store data persistently to disk. This is
convenient, as it's already done (and maintained by other people :). However, it's beginning
to look like there might be possible performance benefits to be had from doing an HBase-specific
implementation of MapFile that incorporated some precise features.
> This issue should serve as a place to track discussion about what features might be included
in such an implementation.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message