hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Varley <ivar...@salesforce.com>
Subject Re: hbase as a primary store, or is it more for "2nd class" data?
Date Mon, 14 May 2012 14:48:43 GMT
Ahmed,

Generally speaking, the intent of HBase IS to be a first class data store. It's a young data
store (not even 1.0) so you have to take that into account; but there's been a lot of engineering
put into making it fully safe, and known data safety issues are considered release blockers.
(This is assuming you run with a WAL enabled, have at least 3 replicas in HDFS, etc -- follow
good data safety practices.)

The data loss scenarios I've heard of are mostly of the "byzantine" variety. For example,
if you have an entire data center power outage, you may lose a few seconds of data that had
been synced in the WAL but not fsynced (i.e. flushed by the OS to magnetic media). There are
also various known bugs involving multiple failure scenarios where it could lose data (for
example, if you have multiple successive node failures during replication). To my knowledge,
there are no known "simple" cases where HBase will lose data.

For that matter, relational DBs can lose data too (I've seen it happen, recently, because
of a HW failure). So ultimately, it comes down to how valuable the data is to you, and how
many redundant measures you're willing to take to prevent increasingly rare situations. You
accounting for earthquakes? Solar flares? :)

Ian


On May 13, 2012, at 11:21 PM, Srikanth P. Shreenivas wrote:

> There is a possibility that you may lose data, and hence, I would not use it for first
class data if data cannot be re-created.
> If you can derive data from secondary source and store data in HBase for performance
gains, then, it is a viable use case.
> 
> Regards,
> Srikanth
> 
> -----Original Message-----
> From: S Ahmed [mailto:sahmed1020@gmail.com]
> Sent: Monday, May 14, 2012 7:52 AM
> To: user@hbase.apache.org; Otis Gospodnetic
> Subject: Re: hbase as a primary store, or is it more for "2nd class" data?
> 
> Otis,
> 
> It kind of goes back to what I was saying earlier, if FB is using it for searching your
inbox, or storing your chat messages or wall posts, I don't really think that is important
(and really it isn't hehe)
> 
> I was just making an observation and wanted to get a feel for what others think.  Obviously
ever tool has its purpose and domain, and I was curious as to what others have seen in production
usage etc.
> 
> (I do realize some use cases the data is very important like analytic data that usually
correlates to advertising $$ etc.)
> 
> On Sun, May 13, 2012 at 10:00 PM, Otis Gospodnetic < otis_gospodnetic@yahoo.com>
wrote:
> 
>> Hi Ahmed,
>> 
>> At Sematext we have a few SaaS products that use HBase as the primary
>> data store.  I hear Facebook uses HBase for some important stuff, too.
>> ;) So far we've survived.  HBase does have rough edges, but also good
>> developers who are making it better every day.
>> 
>> Otis
>> ----
>> Performance Monitoring for Solr / ElasticSearch / HBase -
>> http://sematext.com/spm
>> 
>> 
>> 
>>> ________________________________
>>> From: S Ahmed <sahmed1020@gmail.com>
>>> To: user@hbase.apache.org
>>> Sent: Sunday, May 13, 2012 8:14 PM
>>> Subject: hbase as a primary store, or is it more for "2nd class" data?
>>> 
>>> I'm interested to learn if people are using hbase as a primary store
>>> or is it more for "2nd class" type data.
>>> 
>>> Pretend you have a CMS product, or eCommerce Saas application:
>>> 
>>> What I mean by this is, I consider "primary store" to mean storing
>>> the actual content (say articles, or blog posts), category data, user
>>> information, or shopping cart order, product information.
>>> 
>>> "2nd class" type data is data like metrics, analytics, log data, or
>>> say index data (data that can be re-built via the primary store).
>>> 
>>> In general 2nd class data is data that if lost, it won't bring the
>> business
>>> to its knees.
>>> 
>>> What do you guys think, am I right?
>>> 
>>> i.e. if you are creating a Saas product, it wouldn't be advisible to
>>> build it using hbase (or it will be kind of bleeding edge architecture).
>>> 
>>> 
>>> 
>> 
> 
> ________________________________
> 
> http://www.mindtree.com/email/disclaimer.html


Mime
View raw message