hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gurjeet Singh <gurj...@gmail.com>
Subject Slow full-table scans
Date Sun, 12 Aug 2012 06:04:43 GMT
Hi,

I am trying to read all the data out of an HBase table using a scan
and it is extremely slow.

Here are some characteristics of the data:

1. The total table size is tiny (~200MB)
2. The table has ~100 rows and ~200,000 columns in a SINGLE family.
Thus the size of each cell is ~10bytes and the size of each row is
~2MB
3. Currently scanning the whole table takes ~400s (both in a
distributed setting with 12 nodes or so and on a single node), thus
5sec/row
4. The row keys are unique 8 byte crypto hashes of sequential numbers
5. The scanner is set to fetch a FULL row at a time (scan.setBatch)
and is set to fetch 100MB of data at a time (scan.setCaching)
6. Changing the caching size seems to have no effect on the total scan
time at all
7. The column family is setup to keep a single version of the cells,
no compression, and no block cache.

Am I missing something ? Is there a way to optimize this ?

I guess a general question I have is whether HBase is good datastore
for storing many medium sized (~50GB), dense datasets with lots of
columns when a lot of the queries require full table scans ?

Thanks!
Gurjeet

Mime
View raw message