hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@yahoo.com>
Subject RE: Slow mapreduce using Hbase , regardless on number of machines
Date Wed, 09 Jul 2008 18:37:45 GMT
New HBase tables start with one region. The default split point -- when existing region(s)
are split into more regions -- is when the size of the backing store file for any column family
of the table exceeds 256MB. Until the table splits, you are guaranteed that only one RegionServer
will be serving the table. Furthermore, the TableMap utility class configures the number of
map operations for a job equal to the number of regions for a table. Taking into account I/O
considerations, this makes sense. 

One way to speed the process of splitting a table into multiple regions is to adjust the hbase.hregion.max.filesize
configuration parameter downward. I would advise that this value should not be set smaller
than the DFS blocksize. 

Even so, until you store a substantial amount of data into your test table(s), there is not
much if any parallelism available and furthermore you incur the overhead of Hadoop job scheduling.


Hope this helps,

   - Andy

--- On Wed, 7/9/08, Yair Even-Zohar <yaire@revenuescience.com> wrote:

> From: Yair Even-Zohar <yaire@revenuescience.com>
> Subject: RE: Slow mapreduce using Hbase , regardless on number of machines
> To: hbase-user@hadoop.apache.org
> Date: Wednesday, July 9, 2008, 9:30 AM
> How do I find the number of regions for an HTable? 
> In a quick lookup I did on the actual machines, it seems
> that all the
> machine had new data in them once I load the table.
> 
> Thanks
> -Yair
> 
> -----Original Message-----
> From: Bryan Duxbury [mailto:bryan@rapleaf.com] 
> Sent: Wednesday, July 09, 2008 11:13 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Slow mapreduce using Hbase , regardless on
> number of
> machines
> 
> How many regions are there in your table? If your 200k
> regions fits  
> inside a single region, adding more region servers
> isn't going to  
> make anything faster because only one server will be
> participating.
> 
> -Bryan
> 
> On Jul 9, 2008, at 7:36 AM, yair even-zohar wrote:
> 
> > I am testing HBase 0.1.2 and am getting the following
> performance  
> > using RowCounter class (I had to modify the main()
> method of the  
> > original class because it contains some hardcoded 
> parameters :-)
> >
> > Single regionserver  - counting 200,000 lines in 60 or
> 61 seconds
> > 5 regieonservers - counting 200,000 lines in 55 or 58
> seconds
> >
> > Clearly, one expects better performance, so I assume
> I'm doing  
> > something wrong. By the way, I'm getting about the
> same performance  
> > when I'm iterating through a scanner without the
> mapreduce.
> >
> > Here is my hadoop-site.xml
> >
> > <configuration>
> >   <property>
> >     <name>fs.default.name</name>
> >    
> <value>hdfs://sb-centercluster01:9100</value>
> >   </property>
> >   <property>
> >     <name>mapred.job.tracker</name>
> >    
> <value>hdfs://sb-centercluster01:9101</value>
> >   </property>
> >   <property>
> >     <name>mapred.map.tasks</name>
> >     <value>13</value>
> >   </property>
> >   <property>
> >     <name>mapred.reduce.tasks</name>
> >     <value>5</value>
> >   </property>
> >   <property>
> >     <name>dfs.replication</name>
> >     <value>3</value>
> >   </property>
> >   <property>
> >     <name>dfs.name.dir</name>
> >    
> <value>/home/hadoop/dfs16,/tmp/hadoop/dfs16</value>
> >   </property>
> >   <property>
> >     <name>dfs.data.dir</name>
> >    
> <value>/state/partition1/hadoop/dfs16</value>
> >   </property>
> > </configuration>
> >
> > Increasing "io.bytes.per.checksum" and
> "io.file.buffer.size" didn't  
> > help. Neither decreasing "dfs.replication"
> >
> > Here is my hbase-site.xml
> >
> > <configuration>
> > <property>
> >     <name>hbase.master</name>
> >    
> <value>sb-centercluster01:60002</value>
> >     <description>The host and port that the
> HBase master runs at.
> >     </description>
> >   </property>
> >   <property>
> >     <name>hbase.rootdir</name>
> >    
> <value>hdfs://sb-centercluster01:9100/hbase</value>
> >     <description>The directory shared by region
> servers.
> >     </description>
> >   </property>
> >   <property>
> >     <name>hbase.io.index.interval</name>
> >     <value>8</value>
> >   </property>
> > </configuration>
> >
> >
> > Any help will be appreciated.
> >
> > Thanks
> > -Yair
> >
> >
> >


      

Mime
View raw message