hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing
Date Fri, 08 Mar 2013 20:08:17 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597499#comment-13597499

Sergey Shelukhin commented on HBASE-8034:

bq. OutputStream will always implement getPos?
We rely on it in a few places in HFileWriterV2, so I would say yes.

bq. You need to change this comment so that it says its an estimate and say how you came by
the estimate – in other words, this will be definitive doc on this new metadata:
bq. Can you clarify what file versions are considered 'old files' ?
Done, on the method.

bq. Would it make more sense to expose the number of KeyValues in the HFile?
That is an interesting question. For the purposes of compaction we care more about physical
size being similar.
For the purposes of reads it's unclear, but probably key values. May be an improvement JIRA
(including for default compaction algo)

bq. This strikes me as flakey. Will there be another thread writing to the OutputStream when
this method is invoked? Should it be synchronized?
Probably not. Do you mean background writing inside the object or write calls?
We don't control the implementation for the former (it's hadoop one)... For the latter, similarly
to HFileWriterV2, we rely on calling this method when we know we are not writing. That could
be broken by changes, but adding sync to file writing for this would seem to be an overkill.
> record on-disk data size for store file and make it available during writing
> ----------------------------------------------------------------------------
>                 Key: HBASE-8034
>                 URL: https://issues.apache.org/jira/browse/HBASE-8034
>             Project: HBase
>          Issue Type: Task
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Minor
>         Attachments: HBASE-8034-v0.patch
> To better estimate the size of data in the file, and to be able to split files intelligently
during any multi-file compactor like stripe or level.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message