hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jmozah <jmo...@gmail.com>
Subject Re: Using HBase serving to replace memcached
Date Tue, 21 Aug 2012 15:56:27 GMT
> 1. After reading the materials you sent to me, I am confused how Bloom Filter could save
I/O during random read. Supposing I am not using Bloom Filter, in order to find whether a
row (or row-key) exists, we need to scan the index block which is at the end part of an HFile,
the scan is in memory (I think index block is always in memory, please feel free to correct
me if I am wrong) using binary search -- it should be pretty fast. With Bloom Filter, we could
be a bit faster by looking up Bloom Filter bit vector in memory. Since both index block binary
search and Bloom Filter bit vector search are doing in memory (no I/O is involved), what kinds
of I/O is saved? :-)

If bloom says the Row *may* be present.. the block is loaded otherwise not...
If there is no bloom... you have to load every block and scan to find if the row exists..

This may incur more IO 

> 2. 
> > One Hadoop job doing random reads is perfectly fine.  but , since you said "Handling
directly user traffic"... i assumed you wanted to
> > expose HBase independently to every client request, thereby having as many connections
as the number of simultaneous req..
> Sorry I need to confirm again on this point. I think you mean establishing a new connection
for each request is not good, using connection pool or asynchronous I/O is preferred?

View raw message