hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Migrating from Apache Cassandra to Hbase
Date Tue, 11 Sep 2018 13:26:44 GMT
Please be patient in getting a response to questinos you post to this 
list as we're all volunteers.

On 9/8/18 2:16 AM, onmstester onmstester wrote:
> Hi, Currently I'm using Apache Cassandra as backend for my restfull application. Having
a cluster of 30 nodes (each having 12 cores, 64gb ram and 6 TB disk which 50% of the disk
been used) write and read throughput is more than satisfactory for us. The input is a fixed
set of long and int columns which we need to query it based on every column, so having 8 columns
there should be 8 tables based on Cassandra query plan recommendation. The cassandra keyspace
schema would be someting like this: Table 1 (timebucket,col1, ...,col8, primary key(timebuecket,col1))
to handle select * from input where timebucket = X and col1 = Y .... Table 8 (timebucket,col1,
...,col8, primary key(timebuecket,col8)) So for each input row, there would be 8X insert in
Cassandra (not considering RF) and using TTL of 12 months, production cluster should keep
about 2 Peta Bytes of data With recommended node density for Cassandra cluster (2 TB per node),
i need a cluster with more than 1000 nodes (which i can not afford) So long story short: I'm
looking for an alternative to Apache Cassandra for this application. How HBase would solve
these problem: 

> 1. 8X data redundancy due to needed queries 

HBase provides one intrinsic "index" over the data in your table and 
that is the "rowkey". If you need to access the same data 8 different 
ways, you would need to come up with 8 indexes.

FWIW, this is not what I commonly see. Usually there are 2 or 3 lookups 
that need to happen in the "fast path", not 8. Perhaps you need to take 
another look at your application needs?

> 2. nodes with large data density (30 TB data on each node if No.1 could not be solved
in HBase), how HBase would handle compaction and node join-remove problems while there is
only 5 * 6 TB 7200 SATA Disk available on each node? How much Hbase needs as empty space for
template files of compaction? 

HBase uses a distributed filesystem to ensure that data is available to 
be read by any RegionServer. Obviously, that filesystem needs to have 
sufficient capacity to write a new file which is approximately the sum 
of the file sizes being compacted.

> 3. Also i read in some documents (including datastax's) that HBase is more 
of a offline & data-lake backend that better not to be used as web 
application backendd which needs less than some seconds QoS in response 
time. Thanks in advance Sent using Zoho Mail

Sounds like marketing trash to me. The entire premise around HBase's 
architecture is:

* Low latency random writes/updates
* Low latency random reads
* High throughput writes via batch tools (e.g. Bulk loading)

IIRC, many early adopters of HBase were using it in the critical-path 
for web applications.
View raw message