hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: first scan returns nothing and how big is big?
Date Mon, 30 Jun 2014 22:25:33 GMT
FuzzyRowFilter is an interesting filter around which there has been user
feedback on various scenarios.

If you can write a unit test which exhibits the problem in your first
point, that would help us track down the root cause.

I checked FuzzyRowFilter in 0.94 branch - last fix for FuzzyRowFilter
was HBASE-7628
which you already have in 0.94.15


On Mon, Jun 30, 2014 at 2:59 PM, Liam Slusser <lslusser@gmail.com> wrote:

> Hey Hbase list,
> First question - It seems that the first time I do a scan with a few
> filters the system returns nothing - it also takes a long time (20-30
> seconds) - but I can run the exact same request over again and it goes much
> quicker (2-3 seconds for a total scan, I figured things are cached the
> second time which is fine) but the 2nd time around I get results.  It is
> the exact same scan request.  I don't get any errors and nothing in the log
> files...
> Has anybody else noticed anything like this?  I'm running HBase
> 0.94.15-cdh4.6.0 and using FuzzyRowFilter with SingleColumnValueFilter on
> top of my scan.
> Second question - how big is too big?  I am using my hbase database to
> store parsed logs, currently I am breaking the logs into monthly tables.  I
> am inputting around 350 million logs a day so near the end of the month
> there is an estimated 8-10 billion rows per table.  All seems to be fine, I
> am able to use FuzzyRowFilter+SingleColumnValueFilter and scan over an hour
> of logs in about 10 seconds so the performance is still very decent.  Is
> there any advantage to breaking the table up into separate days?  Is there
> a best practices guide for tables this big?
> thanks!
> liam

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message