hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Melvin Kanasseril <Melvin.Kanasse...@Sophos.com>
Subject Wide table vs narrower table with blob
Date Thu, 10 Sep 2015 16:00:04 GMT

This probably has come up before but I wanted to know if there is a recommendation around
having tables with all attribute data as separate columns v/s an approach with most of the
attribute data stored as a blob in a single column and the rest as separate columns(for column
filter searches). I am aware of the limitations with lumping the data into a blob but was
curious to see if there is an improvement on throughput/latency.

I am leaning towards there not being much of a difference or this being a micro-optimization
not worth the tradeoff but when we ran a set of benchmarks to test this(on ver 0.94), the
hybrid approach with the blob data seem to show a 10-12% improvement in write throughput for
the same number of client threads with evenly distributed puts over a pre-spit table on a
12 node cluster. I used Avro for serialization and all the columns (there are about 40 without
the blob column and 10 with it) are part of one column family. The size of data for a row
is around 5 MB before serialization. Any thoughts whether this is worth pursuing?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message