hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu, Ming (Ming)" <ming....@esgyn.cn>
Subject how to get random rows from a big hbase table faster
Date Thu, 12 Apr 2018 16:16:07 GMT
Hi, all,

We have a hbase table which has 1 billion rows, and we want to randomly get 1M from that table.
We are now trying the RandomRowFilter, but it is still very slow. If I understand it correctly,
in the Server side, RandomRowFilter still need to read all 1 billions but return randomly
1% for them. But read 1 billion rows is very slow. Is this true?

So is there any other better way to randomly get 1% rows from a given table? Any idea will
be very appreciated.
We don't know the distribution of the 1 billion rows in advance.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message