hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Should I use HBASE?
Date Wed, 14 Sep 2011 21:23:24 GMT


I think you misunderstood my point.

The initial author asks a question about using HBase, yet doesn't really provide enough detailed
information as to what he wants to achieve and why he is failing.

My point was based on the information that he presented, he didn't show how or why his RDBMs
solution was failing. (Or what he meant when he used the term fail.) 
There are so many reasons why the RDBMs could fail and it could be a factor of which RDBMs
is being used. 
I've seen 50K ticks a second being ingested in to Informix's Financial Foundation offering
10 years ago.  Here, there is a specific set up of the servers and configuration of IDS.
But that's 50K records inserted in a second, not 5K every 5 minutes.

Is it trivial? Probably not trivial, but still not really rocket science. 

But I digress. Again the point is that we have a person coming here and asking us 'is this
a good fit' and it would be better to say 'it depends' or 'you haven't provided enough information...'

To your point, yes, there are other databases out there like Informix and Oracle that scale
better than MySQL.  If the issue is that his RDBMs can't keep up, then one question I have
to ask is if he's thought about changing to a different RDBMs platform.  What happens if you
say sure we can do this in HBase, and then he pulls out his 'must be ACID compliant' card?


> From: ivarley@salesforce.com
> To: user@hbase.apache.org
> Date: Wed, 14 Sep 2011 12:01:46 -0700
> Subject: Re: Should I use HBASE?
> That's an important point to make, Michael. Jumping to HBase (or any NoSQL store) from
an RDBMS has pros and cons; the pros are generally that you can scale linearly on cheap(er)
hardware as your data and usage grows, but the cons are that many things you take for granted
in an RDBMS (like transactions, joins, indexes) aren't built in. You shouldn't assume that
just because it's "a lot" of data, that an RDBMS won't handle it well. Benchmarking is key.

> In this case, 6-months' worth of data at a rate of 10K inserts per 5 minutes comes out
to a steady state of about 500M rows (is that what you mean, @stable29?). Even with skinny
rows, that's not "trivial" for a relational database, especially if that database is MySQL.
It can work, but you'll have to have someone who really understands the DB at a low level
and can administer it, troubleshoot, deal with physical deletion after the 6 months is up,
etc. If you ever need to change your schema while keeping the system online, that could also
be a challenge. These things are all TOTALLY doable on a relational DB, but you are at least
edging towards the territory where there's a reasonable case to be made for HBase.
> Also, since you also don't (probably) have much worry in terms of complex transactions,
joins, etc., it does sound like a situation where a small HBase cluster might do a nice job
at storing this data for you. If you can design in terms of one (or a small number) of access
(read & write) patterns that will always be used, you can really optimize it to the point
that you pretty much know exactly how every write is going onto the disk and getting read
from the disk.
> Even with HBase, though, you'll still need someone who really understands the architecture,
etc. The difference might just be that HBase is fundamentally simpler than a relational DB;
if that simplicity provides what you need without complex workarounds, cool. HBase puts you
closer to the metal than a relational database; sometimes that's good (at scale) and sometimes
it's not (say, if you didn't really need that power and a higher level, more abstract tool
set like a relational database would suffice).
> Ian
> On Sep 14, 2011, at 1:17 PM, Michael Segel wrote:
> > 
> > I realize that this is an HBase group, however nothing in the stated problem would
suggest that an RDBMs couldn't handle the problem. 
> > Inserting 10K rows every 5 minutes poses a challenge to the database?
> > 
> > I guess it would be a challenge based on the size and type of data along with the
database, schema, hardware, etc...  Essentially YMMV.
> > 
> > I'm not sure that switching to HBase would solve their problem.
> > 
> > 
> >> Date: Wed, 14 Sep 2011 08:09:13 -0700
> >> From: otis_gospodnetic@yahoo.com
> >> Subject: Re: Should I use HBASE?
> >> To: user@hbase.apache.org
> >> 
> >> Hi,
> >> 
> >> I'd guess that you could relatively easily write something that writes that
much data into your RDBMS and see how writes start behaving over time and how fast reads are
after you are done with all writes.
> >> Over at Sematext we have this thing called Scalable Performance Monitoring [1]
service and we chose HBase to store all performance metrics, but we keep a LOT of data (points).
> >> 
> >> [1] http://sematext.com/spm/index.html
> >> 
> >> 
> >> Not coincidentally, we also have HBase-specific monitoring and reports there.
> >> 
> >> Otis
> >> 
> >> 
> >>> 
> >>> From: stable29 <arpitak29@gmail.com>
> >>> To: hbase-user@hadoop.apache.org
> >>> Sent: Wednesday, September 14, 2011 6:02 AM
> >>> Subject: Should I use HBASE?
> >>> 
> >>> 
> >>> Currently I am using RDBMS in my project. My project basically monitor
> >>> servers. It has to collect the information from all the servers ( no. of
> >>> servers could be very huge) every 5 minutes and store it in the database.
> >>> storing all the servers information ( around 10000 rows will be inserted
> >>> with logical comparison) within 5 minutes itself is challenging for RDBMS
> >>> database. we have to maintain around 6 months data in the database. 
> >>> So,that’s why the data amount becomes very huge.  This is the primary
> >>> requirement of our project and if it works good then this could be used
> >>> widely. Basically I like to know if at all the Hbase could enhance the
> >>> writing and reading time of the database and could be used to scale the
> >>> database in great respect.
> >>> -- 
> >>> View this message in context: http://old.nabble.com/Should-I-use-HBASE--tp32462213p32462213.html
> >>> Sent from the HBase User mailing list archive at Nabble.com.
> >>> 
> >>> 
> >>> 
> > 		 	   		  
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message