hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liam Slusser <lslus...@gmail.com>
Subject first scan returns nothing and how big is big?
Date Mon, 30 Jun 2014 21:59:22 GMT
Hey Hbase list,

First question - It seems that the first time I do a scan with a few
filters the system returns nothing - it also takes a long time (20-30
seconds) - but I can run the exact same request over again and it goes much
quicker (2-3 seconds for a total scan, I figured things are cached the
second time which is fine) but the 2nd time around I get results.  It is
the exact same scan request.  I don't get any errors and nothing in the log

Has anybody else noticed anything like this?  I'm running HBase
0.94.15-cdh4.6.0 and using FuzzyRowFilter with SingleColumnValueFilter on
top of my scan.

Second question - how big is too big?  I am using my hbase database to
store parsed logs, currently I am breaking the logs into monthly tables.  I
am inputting around 350 million logs a day so near the end of the month
there is an estimated 8-10 billion rows per table.  All seems to be fine, I
am able to use FuzzyRowFilter+SingleColumnValueFilter and scan over an hour
of logs in about 10 seconds so the performance is still very decent.  Is
there any advantage to breaking the table up into separate days?  Is there
a best practices guide for tables this big?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message