hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From houman <baba.opensou...@gmail.com>
Subject Full table scan cost after deleting Millions of Records from HBase Table
Date Wed, 10 Feb 2016 00:01:02 GMT

I'm thinking of creating a table that will have millions of rows; and each
day, I would insert and delete millions of rows to/from it.

Two questions:
1. I'm guessing HBase won't have any problems with this approach, but just
wanted to check that in terms of region-splits or compaction I won't run
into issues.  Can you think of any problems?
2. Let's say there are 6 million records in the table, then do a full
table-scan querying a column-family that has a single family the value in
the cell is either 1 or 0.  Let's say it takes N seconds.  Now I bulk delete
5 million records (but do not run  compaction) and run the same query again,
would I get a much faster response or will HBase need to perform the same
amount of i/o (as if there are still 6 million records there).  Once
compaction is done, then the query would run faster...

Also most queries on the table would scan the entire table.

View this message in context: http://apache-hbase.679495.n3.nabble.com/Full-table-scan-cost-after-deleting-Millions-of-Records-from-HBase-Table-tp4077676.html
Sent from the HBase User mailing list archive at Nabble.com.

View raw message