This is slightly off-topic
There is a recent project called hadoop online (hop) on google-code that promises a online/continuous query ability on top of hadoop which should allow for near real time activities instead of the batch stuff that mapred does
Sent from my phone
Ian Holsman - 703 879-3128
When I wrote my Why Cassandra article, I didn't get into the why I didn't choose x platform because I didn't want to start a flame war by doing comparisons. For HBase, the primary reason I didn't choose it is that while there were benchmarks of what it could theoretically do, there wasn't any real real world deployments proving it. My experience as a systems administrator is that it's best to go with a product that's been proven over time in real world scenarios.
I'll add to this though, that nothing nosql, even Cassandra, has reached the point where I feel it's no-brainer to choose it over anything, including sql based solutions like mysql and oracle. It really comes down to your requirements.
On Sat, Dec 5, 2009 at 11:04 PM, Matt Revelle <firstname.lastname@example.org>
While Hadoop MapReduce isn't meant for realtime use, HBase can handle it.
On Dec 5, 2009, at 21:45, Joe Stump <email@example.com
On Dec 5, 2009, at 7:41 PM, Bill Hastings wrote:
[Is] HBase used for real timish applications and if so any ideas what the largest deployment is.
I don't know of anyone off the top of my head who's using anything built on top of Hadoop for a real-time environment. Hadoop just wasn't built for that. It was built, like MapReduce, for crunching absurd amounts of data across hundreds of nodes in a "reasonable" amount of time.
Just my $0.02.
Over last summer there were some benchmarks included in HBase/Hadoop presentations that showed, IIRC, performance comparable to Cassandra.