hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Misra <pankaj.mi...@impetus.co.in>
Subject RE: HBase BatchMutations - HOT Region Problem
Date Tue, 25 Sep 2012 07:11:40 GMT
Please find attached the table split and the snapshot below.

Start Key                       End Key
                                199999
199999                  333332
333332                  00000000004ccccb
00000000004ccccb        666664
666664                  00000000007ffffd
00000000007ffffd        999996
999996                  0000000000b3332f
0000000000b3332f        0000000000ccccc8
0000000000ccccc8        0000000000e66661
0000000000e66661

As can be seen from the snapshot, the last region being filled up alone with all the data,
containing the keys which do not belong the that range as well.

One doubt that I do have however is the way the keys are being generated the client side.
The keys are generated incrementally per thread and add to the offset. This is then converted
to its string representation and written as ByteBuffer. So converting an integer key to its
String form and then writing it as a ByteBuffer could be a problem?


Thanks and Regards
Pankaj Misra


________________________________________
From: Anoop Sam John [anoopsj@huawei.com]
Sent: Tuesday, September 25, 2012 12:18 PM
To: user@hbase.apache.org
Subject: RE: HBase BatchMutations - HOT Region Problem

Your table is presplit. Can you give the splitkeys that you have used?

-Anoop-
________________________________________
From: Pankaj Misra [pankaj.misra@impetus.co.in]
Sent: Tuesday, September 25, 2012 11:45 AM
To: user@hbase.apache.org
Subject: HBase BatchMutations - HOT Region Problem

Dear All,

I am using HBASE 0.94.1 with Hadoop 0.23.1. I have written a multi-threaded thrift client
to load the data into HBASE using BatchMutations. The size of each batch is 1000 rows and
the table in HBASE is split into 10 regions. The rows are increasing incrementally(0...999999)
with offsets applied for each of the threads(0..99999, 100000...199999, 200000...299999, ...),
so in theory every thread is expected to write in different region. The individual regions
are wide, i.e. every region is expected to store about 100000 rows, so this makes it a total
of 1000000 rows across all the regions.

I am using thrift server/client and only 1 region server as per the default HBase setup.

So if I spawn 10 threads with offsets applied accordingly I was expecting the regions to be
getting parallely filled up which does not seem to be the case. All the inserts pile into
the the same region which make the writes inefficient due to frequent compacting cycles blocking
all the threads. If the threads would have been writing to different regions, this problem
could have been much smaller.

I am not sure if I am missing out on anything, any ideas would be very helpful.

Thanks and Regards
Pankaj Misra

________________________________

Impetus Ranked in the Top 50 India's Best Companies to Work For 2012.

Impetus webcast 'Designing a Test Automation Framework for Multi-vendor Interoperable Systems'
available at http://lf1.me/0E/.


NOTE: This message may contain information that is confidential, proprietary, privileged or
otherwise protected by law. The message is intended solely for the named addressee. If received
in error, please destroy and notify the sender. Any use of this email is prohibited when received
in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this
communication has been maintained nor that the communication is free of errors, virus, interception
or interference.

________________________________

Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.

Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor Interoperable Systems’
available at http://lf1.me/0E/.


NOTE: This message may contain information that is confidential, proprietary, privileged or
otherwise protected by law. The message is intended solely for the named addressee. If received
in error, please destroy and notify the sender. Any use of this email is prohibited when received
in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this
communication has been maintained nor that the communication is free of errors, virus, interception
or interference.
Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message