hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xine Jar <xineja...@googlemail.com>
Subject Should the regions of a table be more or less equally distributed and stored?
Date Thu, 03 Sep 2009 15:07:50 GMT
I have a cluster of 6 nodes (Namenode, Jobtracker, an hbase master, and
three regionservers) running hadoop-0.19.1 and hbase-0.19.3. I have created
an hbase table "mytable" and have written a program to read the value in
each line of the table and get the overall average of the values.

I have few quick clarification questions to pose.

Q1- "MyTable" has one family column and has a size of 400MB. according to
the default value of hbase.hregion.max.filesize I
      have EXPECTED that it should be split into two regions 256MB and
144MB. But the UI on port 60010 showed that the
      "mytable" has 3 regions (107MB+89MB+223MB). Why 3 not 2?

Q2- The UI of the hbase master on port 60010, showed the three regions of
"mytable" each with a start key and end key. I
      noticed  as well that the three regions are stored on the same
regionserver.The other regionservers stored the ROOT and
      the META. Shouldn't the regions of "mytable" be equally distributed
and stored on all region servers?

Q3-The job is taking around 1 minute to finish, I can see that the reduce
function is very slow, could you give me some hints
      how can I make it faster? In which case should I think about splitting
the Job into 2? Something else I have to try to
      enhance the performance?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message