hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Re: Any fast way to random access hbase data?
Date Wed, 13 Aug 2014 13:41:45 GMT
Like what Esteban said.

Try to use more threads to query HBase. Start with 10 clients, each with 1K
gets per batch, and adjust those numbers to see the impact on the
performances.

Any reason why your block cache is disabled? (hfile.block.cache.size = 0)

JM


2014-08-13 5:23 GMT-04:00 leiwangouc@gmail.com <leiwangouc@gmail.com>:

>
> Haven't tried yet
> only one thread
> 10 regions servers, total 2555 regions.
> I am just new to HBase and not sure what exactly the block cache mean,
> here's the configuration i can see from the CDH HBase master UI:
> <name>hbase.rs.cacheblocksonwrite</name>
> <value>false</value>
> <source>hbase-default.xml</source>
>
> <name>hbase.offheapcache.percentage</name>
> <value>0</value>
> <source>hbase-default.xml</source>
>
> <name>hfile.block.cache.size</name>
> <value>0.0</value>
> <source>programatically</source>
> Table description:
>  {NAME => 'userdigest', coprocessor$3 =>
> 'hdfs://agrant/user/tracking/userdigest/copro
>
> cessor/endpoint_0.0.17.jar|com.agrantsem.data.userdigest.endpoint.UserdigestEndPoint|
> 1001|', coprocessor$2 =>
> '|org.apache.hadoop.hbase.coprocessor.AggregateImplementatio
> n||', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE',
> BLOOMFILTER => 'ROWC
> OL', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'LZ4',
> MIN_VERSIONS =>
> '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE =>
> '65536', IN_ME
> MORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
>
>
>
> leiwangouc@gmail.com
>
> From: Esteban Gutierrez
> Date: 2014-08-13 15:59
> To: user@hbase.apache.org
> Subject: Re: Any fast way to random access hbase data?
> Hello Lei,
>
> Have you tried a larger batch size? how many threads or tasks are you using
> to fetch data? could you please describe a little bit more your HBase
> cluster? e.g. how many region servers, how many regions per RS? whats the
> hit ratio of the block cache? any chance for you to share the table schema?
>
> cheers,
> esteban.
>
>
>
> --
> Cloudera, Inc.
>
>
>
> On Wed, Aug 13, 2014 at 12:34 AM, leiwangouc@gmail.com <
> leiwangouc@gmail.com
> > wrote:
>
> >
> > I have a hbase table with more than 2G rows.
> > Every hour there comes 5M~10M row ids and i must get all the row info
> from
> > the hbase table.
> > But even I use the batch call(1000 row ids as a list) as described here
> >
> >
> http://stackoverflow.com/questions/13310434/hbase-api-get-data-rows-information-by-list-of-row-ids
> >
> > It takes about 1 hour.
> > Any other way to do this more quickly?
> >
> > Thanks,
> > Lei
> >
> >
> > leiwangouc@gmail.com
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message