hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jeff l <jeff.pubm...@gmail.com>
Subject Best practice for storage of data that changes
Date Sat, 24 Nov 2012 20:56:17 GMT
Hi All,

I'm coming from the RDBMS world and am looking at hdfs for long term data
storage and analysis.

I've done some research and set up some smallish hdfs clusters with hive
for testing but I'm having a little trouble understanding how everything
fits together and was hoping someone could point me in the right direction.

I'm looking at storing two types of data:

1. Append-only data - e.g. weblogs or user logins
2. Account/User data

HDFS seems to be perfect for append-only data like #1, but I'm having
trouble figuring out what to do with data that may change frequently.

A simple example would be user data where various bits of information:
email, etc may change from day to day.  Would hbase or cassandra be the
better way to go for this type of data, and can I overlay hive over all (
hdfs, hbase, cassandra ) so that I can query the data through a single

Thanks in advance for any help.

View raw message