hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Valle (BLOOMBERG/ LONDON)" <mvallemil...@bloomberg.net>
Subject data partitioning and data model
Date Fri, 20 Feb 2015 16:49:48 GMT

This is my first message in this mailing list, I just subscribed. 

I have been using Cassandra for the last few years and now I am trying to create a POC using
HBase. Therefore, I am reading the HBase docs but it's been really hard to find how HBase
behaves in some situations, when compared to Cassandra. I thought maybe it was a good idea
to ask here, as people in this list might know the differences better than anyone else.

What I want to do is creating a simple application optimized for writes (not interested in
HBase / Cassandra product comparisions here, I am assuming I will use HBase and that's it,
just wanna understand the best way of doing it in HBase world). I want to be able to write
alerts to the cluster, where each alert would have columns like:
- alert id
- user id
- date/time
- alert data

Later, I want to search for alerts per user, so my main query could be considered to be something
Select * from alerts where user_id = $id and date/time > 10 days ago.

I want to decide the data model for my application.

Here are my questions:

- In Cassandra, I would partition by user + day, as some users can have many alerts and some
just 1 or a few. In hbase, assuming all alerts for a user would always fit in a single partition
/ region, can I just use user_id as my row key and assume data will be distributed along the

- Suppose I want to write 100 000 rows from a client machine and these are from 30 000 users.
What's the best manner to write these if I want to optimize for writes? Should I batch all
100 k requests in one to a single server? As I am trying to optimize for writes, I would like
to split these requests across several nodes instead of sending them all to one. I found this
article: http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/ But not sure
if it's what I need

Thanks in advance!

Best regards,
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message