hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bharath vissapragada <bharathvissapragada1...@gmail.com>
Subject Re: Best practice for storage of data that changes
Date Sun, 25 Nov 2012 04:10:22 GMT
Hi Jeff,

Please look at [1] . You can store your data in HBase tables and query them
normally just by mapping them to Hive tables. Regarding Cassandra support,
please follow JIRA [2], its not yet in the trunk I suppose!

[1] https://cwiki.apache.org/Hive/hbaseintegration.html
[2] https://issues.apache.org/jira/browse/HIVE-1434


On Sun, Nov 25, 2012 at 2:26 AM, jeff l <jeff.pubmail@gmail.com> wrote:

> Hi All,
> I'm coming from the RDBMS world and am looking at hdfs for long term data
> storage and analysis.
> I've done some research and set up some smallish hdfs clusters with hive
> for testing but I'm having a little trouble understanding how everything
> fits together and was hoping someone could point me in the right direction.
> I'm looking at storing two types of data:
> 1. Append-only data - e.g. weblogs or user logins
> 2. Account/User data
> HDFS seems to be perfect for append-only data like #1, but I'm having
> trouble figuring out what to do with data that may change frequently.
> A simple example would be user data where various bits of information:
> email, etc may change from day to day.  Would hbase or cassandra be the
> better way to go for this type of data, and can I overlay hive over all (
> hdfs, hbase, cassandra ) so that I can query the data through a single
> interface?
> Thanks in advance for any help.

Bharath .V

View raw message