hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10642) Add M/R over snapshots to 0.94
Date Mon, 10 Mar 2014 22:04:48 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926303#comment-13926303

Enis Soztutar commented on HBASE-10642:

bq. (The existing 0.94 patch picked up the distribution from the table, not the snapshot,
I am not sure the HFileLinks influence this and whether even the trunk patch does the right
thing - does it follow HFileLinks? If not, how does it find the real file distribution?).
>From my reading of StoreFileInfo.computeHDFSBlocksDistribution(), it does the right thing,
but I have not checked this personally. 
bq. Also, in the trunk version I notice that we update the counters after each record, is
that by design? Seems CPU heavy.
We don't have to incr the AtomicLong everytime, we can accumulate sum and the update the counter
bq. Maybe we should report the data locality index that HBase calculates as metric to M/R?
Makes sense.
I've checked the locality computations. v4 patch looks good. 

> Add M/R over snapshots to 0.94
> ------------------------------
>                 Key: HBASE-10642
>                 URL: https://issues.apache.org/jira/browse/HBASE-10642
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>             Fix For: 0.94.18
>         Attachments: 10642-0.94-v2.txt, 10642-0.94-v3.txt, 10642-0.94-v4.txt, 10642-0.94.txt,
> I think we want drive towards all (or most) M/R over HBase to be against snapshots and
HDFS directly.
> Adopting a simple input format (even if just as a sample) as part of HBase will allow
us to direct users this way.

This message was sent by Atlassian JIRA

View raw message