hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From N Keywal <nkey...@gmail.com>
Subject Re: Client receives SocketTimeoutException (CallerDisconnected on RS)
Date Tue, 28 Aug 2012 08:03:00 GMT
> Totally randoms (even on keys that do not exist).

It worth checking if it matches your real use cases. I expect that read by
row key are most of the time on existing rows (as a traditional db
relationship or a UI or workflow driven stuff), even if I'm sure it's
possible to have something totally different.

It's not going to have an impact all the time. But I can easily imagine
scenarios with better performances when the row exists vs. does not exist.
For example, you have to read more files to check that the row key is
really not there. This will be even more true if  you're inserting a lot of
data simultaneously (i.e. the files won't be major compacted). On the
opposite side, bloom filters may be more efficient in this case. But again,
I'm not sure they're going to be efficient on random data. It's like
compression algorithms: on really random data; they will all have similar &
bad results. It does not mean they are equivalent, nor useless.

> I'm working on it ! Thanks,

If you can reproduce a 'bad behavior' or a performance issue, we will try
to fix it for sure.

Have a nice day,


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message