hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lukas Nalezenec (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
Date Tue, 04 Feb 2014 10:40:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890550#comment-13890550

Lukas Nalezenec commented on HBASE-10413:

I know it is hacky. It is my first hbase commit, i was not sure how to do it so I asked 3
people and then published first draft as soon as possible. Everybody was fine with the solution
:( .

The hacky solution is good enough for us - I have already deployed it yesterday.  I cant spent
much more time on this. I need to close it by tomorrow.

How about this solution? I am not sure if it is the best way - it does not work with Scan

    We need to filter regions by table
    It would be nice to if we could filter size by column families.


   HBaseAdmin admin = new HBaseAdmin(configuration);

    ClusterStatus clusterStatus = admin.getClusterStatus();
    Collection<ServerName> servers = clusterStatus.getServers();

    for (ServerName serverName: servers) {
      ServerLoad serverLoad = clusterStatus.getLoad(serverName);

      for (Map.Entry<byte[], RegionLoad> regionEntry: serverLoad.getRegionsLoad().entrySet())
        byte[] regionId = regionEntry.getKey();
        RegionLoad regionLoad = regionEntry.getValue();

        long regionSize = 1024 * 1024 * (regionLoad.getMemStoreSizeMB() + regionLoad.getStorefileSizeMB());

        sizeMap.put(regionId, regionSize);

> Tablesplit.getLength returns 0
> ------------------------------
>                 Key: HBASE-10413
>                 URL: https://issues.apache.org/jira/browse/HBASE-10413
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, mapreduce
>    Affects Versions:
>            Reporter: Lukas Nalezenec
>            Assignee: Lukas Nalezenec
> InputSplits should be sorted by length but TableSplit does not contain real getLength
>   @Override
>   public long getLength() {
>     // Not clear how to obtain this... seems to be used only for sorting splits
>     return 0;
>   }
> This is causing us problem with scheduling - we have got jobs that are supposed to finish
in limited time but they get often stuck in last mapper working on large region.
> Can we implement this method ? 
> What is the best way ?
> We were thinking about estimating size by size of files on HDFS.
> We would like to get Scanner from TableSplit, use startRow, stopRow and column families
to get corresponding region than computing size of HDFS for given region and column family.

> Update:
> This ticket was about production issue - I talked with guy who worked on this and he
said our production issue was probably not directly caused by getLength() returning 0. 

This message was sent by Atlassian JIRA

View raw message