hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lukas Nalezenec (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
Date Tue, 04 Feb 2014 10:40:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890550#comment-13890550
] 

Lukas Nalezenec commented on HBASE-10413:
-----------------------------------------

Hi,
I know it is hacky. It is my first hbase commit, i was not sure how to do it so I asked 3
people and then published first draft as soon as possible. Everybody was fine with the solution
:( .

The hacky solution is good enough for us - I have already deployed it yesterday.  I cant spent
much more time on this. I need to close it by tomorrow.

How about this solution? I am not sure if it is the best way - it does not work with Scan
ranges.

ToDos:
    We need to filter regions by table
    It would be nice to if we could filter size by column families.


https://github.com/apache/hbase/pull/8/files#diff-46ff60f1e27e3d77131acb7873050990R68


   HBaseAdmin admin = new HBaseAdmin(configuration);

    ClusterStatus clusterStatus = admin.getClusterStatus();
    Collection<ServerName> servers = clusterStatus.getServers();

    for (ServerName serverName: servers) {
      ServerLoad serverLoad = clusterStatus.getLoad(serverName);

      for (Map.Entry<byte[], RegionLoad> regionEntry: serverLoad.getRegionsLoad().entrySet())
{
        byte[] regionId = regionEntry.getKey();
        RegionLoad regionLoad = regionEntry.getValue();

        long regionSize = 1024 * 1024 * (regionLoad.getMemStoreSizeMB() + regionLoad.getStorefileSizeMB());

        sizeMap.put(regionId, regionSize);
      }
    }

> Tablesplit.getLength returns 0
> ------------------------------
>
>                 Key: HBASE-10413
>                 URL: https://issues.apache.org/jira/browse/HBASE-10413
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, mapreduce
>    Affects Versions: 0.96.1.1
>            Reporter: Lukas Nalezenec
>            Assignee: Lukas Nalezenec
>
> InputSplits should be sorted by length but TableSplit does not contain real getLength
implementation:
>   @Override
>   public long getLength() {
>     // Not clear how to obtain this... seems to be used only for sorting splits
>     return 0;
>   }
> This is causing us problem with scheduling - we have got jobs that are supposed to finish
in limited time but they get often stuck in last mapper working on large region.
> Can we implement this method ? 
> What is the best way ?
> We were thinking about estimating size by size of files on HDFS.
> We would like to get Scanner from TableSplit, use startRow, stopRow and column families
to get corresponding region than computing size of HDFS for given region and column family.

> Update:
> This ticket was about production issue - I talked with guy who worked on this and he
said our production issue was probably not directly caused by getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message