hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: [Discuss]A issue of scan timeout after delete a number of rows
Date Fri, 03 Jul 2015 02:24:40 GMT
You may have read http://hbase.apache.org/book.html#version.delete

Please see 'Scan Improvements in HBase 1.1.0' under


On Thu, Jul 2, 2015 at 6:54 PM, Song Geng <soul.great@me.com> wrote:

> Hi everyone,
> I am a complete novice in hbase and the community. And this is my first
> email. Please forgive me if I make some trouble.
> Here is the issue:
> We use hbase store the file information and using compose userid and
> rowkey as the file path.
>         For example: A user’s id is 1000, and he has a file “a.txt” store
> in “/root/data/”, then the rowkey will be “1000_/root/data/a.txt” .
> User will store a number of files in our system, like "millions of" or
> "billions of”. Sometimes, he will do a delete action to a folder which
> maybe store millions of files. And after this kind of delete action, it
> will often turn up a “timeout issue” while scanning until we do a major
> compaction.
> In order to make clear this issue, I read the google bigtable paper,
> “hbase in action” and bloggers about block cache wrote by Nick, and many
> other articles relevant to hbase, also the source code. I do some tests and
> I got my conclusion list follows:
> The test table only have one column family and this cf only have one
> column.
> There’s 3 aspects will influent the read latency, search key, disk I/O,
> and network I/O.
> Make hbase client caching smaller will reduce the latency for the sake of
> “network I/O”.
> Compare to normal scan, the “delete” scenario will result in spending more
> time on searching and disk I/O. And I think mainly on searching. Think a
> scenario: I put a number of data into hbase that just flush into a hfile.
> Then I delete the majority of these data from the start key. It will record
> into another hfile. At this time, it will read the data one by one if i do
> a scan action from the start key(suppose there’s no compaction). Until we
> get the first item not deleted.
> So, do compaction is the most effective way to resolve this kind of issue.
> I still have some doubt. Hope anyone could clear that.
> First, I am not very confirm about the scan process of "delete scenario" I
> described in "number 3”.
> Second, block cache seems make less effect on this scenario.
> P.S. I don’t attach my test result cause I am afraid confuse others. I
> will clear up them if necessary.
> Br, Great Soul
> soul.great@me.com

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message