hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Aleksandrovsky <balek...@gmail.com>
Subject Re: HBase is very slow on full table scan
Date Mon, 08 Feb 2010 23:04:32 GMT
Thanks. This is a one-time scan (per server runtime) in order to build
bloomfilters to speed up access to that table; so definitely not in the
query runtime :-)

On Mon, Feb 8, 2010 at 3:00 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> Yes you can try it I guess ;) Go with 100 or even more.
>
> Are you scanning those 5M rows to serve a user query or it's offline
> processing?
>
> J-D
>
> On Mon, Feb 8, 2010 at 2:57 PM, Boris Aleksandrovsky <baleksan@gmail.com
> >wrote:
>
> > I am using HTable.setScannerCaching(10) and the size of the row is
> variable
> > from 10 to 100K (approx). Should I increase the scan cache size?
> >
> > On Mon, Feb 8, 2010 at 2:47 PM, Jean-Daniel Cryans <jdcryans@apache.org
> > >wrote:
> >
> > > How big are the rows and are you using:
> > >
> > >
> > >
> >
> http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/Scan.html#setCaching(int)<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/Scan.html#setCaching%28int%29>
> > <
> >
> http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/Scan.html#setCaching%28int%29
> > >
> > >
> > > thx
> > >
> > > J-D
> > >
> > > On Mon, Feb 8, 2010 at 2:43 PM, Boris Aleksandrovsky <
> baleksan@gmail.com
> > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > I have noticed that the performance of the full table scan (table
> > > contains
> > > > about 5M rows) is extremely slow in our case. We are running 0.20.2,
> > > > r834515
> > > > and it takes about 3 min / 5000 rows to scan the table.
> > > >
> > > > We have 3 region servers on 3 different machines with the following
> > > > characteristics:
> > > >
> > > > server11265576122987requests=0, regions=124, usedHeap=1468,
> > maxHeap=2983
> > > > server21265576119422requests=4, regions=121, usedHeap=1482,
> > maxHeap=2983
> > > > server31265576119423requests=44, regions=117, usedHeap=1570,
> > maxHeap=2983
> > > > The "slow" table in question is configured as following:
> > > >
> > > > Table = {NAME => 'post', FAMILIES => [{NAME => 'ngrams', VERSIONS
=>
> > '3',
> > > > COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536',
> > > IN_MEMORY
> > > > => 'false', BLOCKCACHE => 'true'}]}
> > > >
> > > > There is nothing suspicious in the log, as far as I can tell.
> > > >
> > > > Please let me know if you need more information about our
> installation
> > > >
> > > > --
> > > > Thanks,
> > > >
> > > > Boris
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> >
> > Boris
> > http://twitter.com/baleksan
> > http://www.linkedin.com/in/baleksan
> >
>



-- 
Thanks,

Boris
http://twitter.com/baleksan
http://www.linkedin.com/in/baleksan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message