hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alx...@aim.com
Subject Re: split table data into two or more tables
Date Fri, 08 Feb 2013 22:16:34 GMT

 Hi,

here is the hbase-site.xml file.

<property>
  <name>hbase.hregion.majorcompaction</name>
  <value>0</value>
</property>
     <property>
  <name>hbase.regionserver.codecs</name>
  <value>snappy,gz</value>
</property>
<property>
    <name>hbase.rootdir</name>
    <value>hdfs://master:9000/hbase</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>master,slave,serverslave</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
 <property>
    <name>dfs.support.append</name>
    <value>true</value>
  </property>
 <property>
    <name>hbase.hregion.memstore.mslab.enabled</name>
    <value>true</value>
  </property>
 <property>
    <name>hbase.regionserver.handler.count</name>
    <value>40</value>
  </property>
 <property>
    <name>hbase.regionserver.global.memstore.upperLimit</name>
    <value>0.45</value>
  </property>
 <property>
    <name>hbase.regionserver.global.memstore.lowerLimit</name>
    <value>0.4</value>
  </property>
 <property>
    <name>hfile.block.cache.size</name>
    <value>0.3</value>
  </property>

<property>
  <name>mapred.map.tasks.speculative.execution</name>
  <value>false</value>
</property>
<property>
  <name>mapred.reduce.tasks.speculative.execution</name>
  <value>false</value>
</property>

<!-- default is 256MB 268435456, this is 1.5GB -->
  <property>
    <name>hbase.hregion.max.filesize</name>
    <value>161061273600</value>
  </property>

  <!-- default is 2 -->
  <property>
    <name>hbase.hregion.memstore.block.multiplier</name>
    <value>4</value>
  </property>

  <!-- default is 64MB 67108864 -->
  <property>
    <name>hbase.hregion.memstore.flush.size</name>
    <value>134217728</value>
  </property>

  <!-- default is 7, should be at least 2x compactionThreshold -->
  <property>
    <name>hbase.hstore.blockingStoreFiles</name>
    <value>200</value>
  </property>
<property>
    <name>hbase.regionserver.lease.period</name>
    <value>1800000</value> <!-- 30 minutes -->
  </property>
  <property>
    <name>hbase.rpc.timeout</name>
    <value>1800000</value> <!-- 30 minutes -->
  </property>



 

 Thanks.
Alex.

 

 

-----Original Message-----
From: Marcos Ortiz <mlortiz@uci.cu>
To: alxsss <alxsss@aim.com>
Cc: user <user@hbase.apache.org>
Sent: Fri, Feb 8, 2013 11:52 am
Subject: Re: split table data into two or more tables


              
    
On 02/08/2013 01:59 PM, alxsss@aim.com      wrote:
    
    
      
Hi,

The rationale is that I have a mapred job that adds new records to an hbase table, constantly.
The next mapred job selects these new records, but it must iterate over all records and check
if it is a candidate for selection.
Since there are too many old records iterating though them in a cluster of 2 nodes +1 master
takes about 2 days. So I thought, splitting them into two tables must reduce this time, and
as soon as I figure out that there is no more new record left in one of the new tables I will
not run mapred job on it.
    
    This use-case is very common and a good practice here is to    pre-split the regions to
control exactly where to put your data and    the size of it, keeping
    always the numbers of regions more manageable.
    
      
Currently, we have 7 regions including ROOT and META.
    
    Can you share your conf/hbase-site.xml ?
    
    
      

Thanks.
Alex.


 

 

-----Original Message-----
From: Ted Yu <yuzhihong@gmail.com>
To: user <user@hbase.apache.org>
Sent: Fri, Feb 8, 2013 10:40 am
Subject: Re: split table data into two or more tables


May I ask the rationale behind this ?
Were you aiming for higher write throughput ?

Please also tell us how many regions you have in the current table.

Thanks

BTW please consider upgrading to 0.94.4

On Fri, Feb 8, 2013 at 10:36 AM, <alxsss@aim.com> wrote:


    
    
      
        
Hello,

I wondered if there is a way of splitting data from one table into two or
more tables in hbase with iidentical schemas, i.e. if table A has 100M
records put 50M into table B, 50M into table C and delete table A.
Currently, I use hbase-0.92.1 and hadoop-1.4.0

Thanks.
Alex.


      
      
 


    
    
    
-- 
      Marcos Ortiz Valmaseda, 
      Product Manager && Data Scientist at UCI
      Blog: http://marcosluis2186.posterous.com
      Twitter: @marcosluis2186
  
 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message