hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alexthomp...@sitelabs.com
Subject RE: Hosting a new web app on Hadoop?
Date Sun, 09 Mar 2008 09:43:12 GMT
<html><body><br>I have had thoughts for the same usage of hBase/Hadoop, that
is as a backend for a dynamic site. <br><br>SCHEMA DESIGN:<br>My application
had about 8 tables in it, all relational, after reading what I could find on hBase design
I have managed to get it down to 2 tables. If you require relationships/referential integrity
shift it into the application layer (same as ebay, yahoo etc)&nbsp; It takes some 'unlearning'
RDBMS design and relearning 'column oriented design' but after a while it clicks with you.<br><br>PERFORMANCE:<br>My
approach has been to lay off querying the db as much as I can, and the work I do do in my
dynamic web app I do behind the scenes via asynchronous calls, this lets me 'smooth load'
hBase, I also pull all the users data down into session data at the web server layer, if they
interact they are interacting with web server session datasets (very fast) - if they do updates/inserts
they are immediately expressed on their served pages, but behi
 nd the scenes I asynchronously persist to hBase in a controlled manor.<br><br>I
was aware of stack's comments on alot of pages etc that mentioned hBase performance and took
account of his concerns into my application architecture.<br><br>Hope this helps.<br><br>Cheers,<br>
Alex Thompson<br><br><br>
<blockquote webmail="1" style="border-left: 2px solid blue; margin-left: 8px; padding-left:
-------- Original Message --------<br>
Subject: Re: Hosting a new web app on Hadoop?<br>
From: stack &lt;stack@duboce.net&gt;<br>
Date: Sat, March 08, 2008 5:14 am<br>
To: hbase-user@hadoop.apache.org<br>
Your biggest problem at the moment will likely be performance.  Our <br>
current numbers are not the best.  See the base of this page: <br>
<a href="http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation" target="_blank">http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation</a>
 Related, if <br>
your live table is concurrently being loaded, there will be periods <br>
during which client will not be able to read data from a region while <br>
its being split and redeployed.  We need to do some work to add an <br>
express lane to minimize this down time (We're talking between 1-2 <br>
seconds but it could be as bad as a couple of minutes at an extreme <br>
dependent on how hard your servers are working).  Also, HBASE-80 is <br>
about adding a cache of hot cells.  Its not implemented yet but <br>
shouldn't be too hard to add.  You'd probably need this servicing users <br>
Charlie O'Keefe wrote:<br>
&gt; It seems like I've seen a lot of mentions of running large data analysis<br>
&gt; jobs on Hadoop clusters, but I can't recall reading anything about hosting a<br>
&gt; website on a Hadoop cluster.<br>
&gt; I'm just starting to learn about this project but my reaction to reading<br>
&gt; about Hadoop is, "Hey, I'm designing a web application and I'm concerned<br>
&gt; that by using a mysql backend, it will be a challenge should I need to scale<br>
&gt; it. Hey, here's a project that is designed to scale elastically on computing<br>
&gt; clusters, and it includes both a scalable execution environment and a<br>
&gt; scalable database! Why not skip mysql and design my backend around HBase?"<br>
&gt; So how about it? I'd be interested in hearing from someone with some<br>
&gt; expertise in Hadoop. Does this idea make sense? Or is there something about<br>
&gt; Hadoop that makes it less than ideal for a new web application project that<br>
&gt; thinks it might scale to lots of data and users?<br>
&gt; I am also very curious about best practices for schema design (or whatever<br>
&gt; the HBase equivalent of a schema is), and how best to handle situations in<br>
&gt; which there are many complex relationships between the entities being<br>
&gt; represented.<br>
&gt; Thanks for any help!<br>
&gt; Charlie<br>
&gt;   <br>


View raw message