hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: How to speedup Hbase query throughput
Date Tue, 26 Apr 2011 05:35:05 GMT
user_month might still be helpful on average if a user looks for one month
and then another a short time later.  This is because your cache could be
primed by the first query.

But you know your application best, of course.

On Mon, Apr 25, 2011 at 10:27 PM, Weihua JIANG <weihua.jiang@gmail.com>wrote:

> Changing key to user_month may not be useful to me since, for each
> query, we only need to get one month report for a user instead of all
> the data stored for a user.
>
> Putting multiple month data into a single row may be useful, but not
> sure. I will perform some experimentation when I have time.
>
> 2011/4/26 Ted Dunning <tdunning@maprtech.com>:
> > Change your key to user_month.
> >
> > That will put all of the records for a user together so you will only
> need a
> > single disk operation to read all of your data.  Also, test the option of
> > putting multiple months in a single row.
> >
> > On Mon, Apr 25, 2011 at 7:59 PM, Weihua JIANG <weihua.jiang@gmail.com
> >wrote:
> >
> >> Hi all,
> >>
> >> We want to implement a bill query system. We have 20M users, the bill
> >> for each user per month contains about 10 0.6K-byte records. We want
> >> to store user bill for 6 months. Of course, user query focused on the
> >> latest month reports. But, the user to be queried doesn't have hot
> >> spot.
> >>
> >> We use CDH3U0 with 6 servers (each with 24G mem and 3 1T disk) for
> >> data node and region server (besides the ZK, namenode and hmaster
> >> servers). RS heap is 8G and DN is 12G. HFile max size is 1G.  The
> >> block cache is 0.4.
> >>
> >> The row key is month+user_id. Each record is stored as a cell. So, a
> >> month report per user is a row in HBase.
> >>
> >> Currently, to store bill records, we can achieve about 30K
> record/second.
> >>
> >> However, the query performance is quite poor. We can only achieve
> >> about 600~700 month_report/second. That is, each region server can
> >> only serve query for about 100 row/second. Block cache hit ratio is
> >> about 20%.
> >>
> >> Do you have any advice on how to improve the query performance?
> >>
> >> Below is some metrics info reported by region server:
> >> 2011-04-26T10:56:12 hbase.regionserver:
> >> RegionServer=regionserver50820, blockCacheCount=40969,
> >> blockCacheEvictedCount=216359, blockCacheFree=671152504,
> >> blockCacheHitCachingRatio=20, blockCacheHitCount=67936,
> >> blockCacheHitRatio=20, blockCacheMissCount=257675,
> >> blockCacheSize=2743351688, compactionQueueSize=0,
> >> compactionSize_avg_time=0, compactionSize_num_ops=7,
> >> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0,
> >> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0,
> >> flushTime_num_ops=0, fsReadLatency_avg_time=46,
> >> fsReadLatency_num_ops=257905, fsSyncLatency_avg_time=0,
> >> fsSyncLatency_num_ops=1726, fsWriteLatency_avg_time=0,
> >> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169,
> >> requests=82.1, storefileIndexSizeMB=188, storefiles=343, stores=169
> >> 2011-04-26T10:56:22 hbase.regionserver:
> >> RegionServer=regionserver50820, blockCacheCount=42500,
> >> blockCacheEvictedCount=216359, blockCacheFree=569659040,
> >> blockCacheHitCachingRatio=20, blockCacheHitCount=68418,
> >> blockCacheHitRatio=20, blockCacheMissCount=259206,
> >> blockCacheSize=2844845152, compactionQueueSize=0,
> >> compactionSize_avg_time=0, compactionSize_num_ops=7,
> >> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0,
> >> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0,
> >> flushTime_num_ops=0, fsReadLatency_avg_time=44,
> >> fsReadLatency_num_ops=259547, fsSyncLatency_avg_time=0,
> >> fsSyncLatency_num_ops=1736, fsWriteLatency_avg_time=0,
> >> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169,
> >> requests=92.2, storefileIndexSizeMB=188, storefiles=343, stores=169
> >> 2011-04-26T10:56:32 hbase.regionserver:
> >> RegionServer=regionserver50820, blockCacheCount=39238,
> >> blockCacheEvictedCount=221509, blockCacheFree=785944072,
> >> blockCacheHitCachingRatio=20, blockCacheHitCount=69043,
> >> blockCacheHitRatio=20, blockCacheMissCount=261095,
> >> blockCacheSize=2628560120, compactionQueueSize=0,
> >> compactionSize_avg_time=0, compactionSize_num_ops=7,
> >> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0,
> >> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0,
> >> flushTime_num_ops=0, fsReadLatency_avg_time=39,
> >> fsReadLatency_num_ops=261070, fsSyncLatency_avg_time=0,
> >> fsSyncLatency_num_ops=1746, fsWriteLatency_avg_time=0,
> >> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169,
> >> requests=128.77777, storefileIndexSizeMB=188, storefiles=343,
> >> stores=169
> >>
> >>
> >> And we also tried to disable block cache, it seems the performance is
> >> even a little bit better. And it we use the configuration 6 DN servers
> >> + 3 RS servers, we can get better throughput at about 1000
> >> month_report/second.  I am confused. Can any one explain the reason?
> >>
> >> Thanks
> >> Weihua
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message