hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jaxzin <Brian.R.Jack...@espn3.com>
Subject Re: Use cases of HBase
Date Tue, 09 Mar 2010 17:29:58 GMT

Thanks Gary, this is great!

I'm designing a central store/service for all user data for the fantasy
section of ESPN.com (profile/preferences/record of activity, you name it). 
The record-of-activity wouldn't be on a page view granularity but more like
"created a league" or "won a trophy" type activities.  I expect it will be
much more read-heavy, at least for the core column families.  And since it's
user data, I expect it to be randomly accessed, keyed on our internal user
IDs.  

I expect it could be fronted by a public RESTful service that browsers might
access directly via Ajax, but our initial usage pattern will most likely be
server-side inclusion of the data on the hosts responsible for rendering
pages.  

But even if its only exposed internally, I don't want each client of the
data to be aware its backed by HBase and so the store will be fronted by a
web or TCP-based service to manage that abstraction layer.  Ideally it would
be a RESTful service, but if I can't get that to perform I'd be willing to
use a higher-performance protocol like Thrift, Google protobuf, etc.

If that's not enough info for guiding me, I'll gladly volunteer more. Thanks
again.

Also to give you some background of what I know already, the reason I'm
asking this publicly is that I spoke with an engineer that did a proof of
concept with HBase and he found the cluster would tip over if you have more
than 4 clients connecting to a regionserver for reads or 1 client/node for
writes.  And that if a region server failed it corrupts the table in an
unrecoverable way.  These issues sounded like blockers to me for using HBase
in an online, mission-critical way so I figure I'm missing something big.  


Gary Helmling wrote:
> 
> Hey Brian,
> 
> We use HBase to complement MySQL in serving activity-stream type data here
> at Meetup.  It's handling real-time requests involved in 20-25% of our
> page
> views, but our latency requirements aren't as strict as yours.  For what
> it's worth, I did a presentation on our setup which will hopefully fill in
> some details: http://www.slideshare.net/ghelmling/hbase-at-meetup
> 
> There are also some great presentations by Ryan Rawson and Jonathan Gray
> on
> how they've used HBase for realtime serving on their sites.  See the
> presentations wiki page:
> http://wiki.apache.org/hadoop/HBase/HBasePresentations
> 
> Like Barney, I suspect where you'll hit some issues will be in your
> latency
> requirements.  Depending on how you layout your data and configure your
> column families, your average latency may be good, but you will hit some
> pauses as I believe reads block at times during region splits or
> compactions
> and memstore flushes (unless you have a fairly static data set).  Others
> here should be able to fill in more details.
> 
> With a relatively small dataset, you may want to look at the "in memory"
> configuration option for your column families.
> 
> What's your expected workload -- writes vs. reads?  types of reads you'll
> be
> doing: random access vs. sequential?  There are a lot of knowledgeable
> folks
> here to offer advice if you can give us some more insight into what you're
> trying to build.
> 
> --gh
> 
> 
> On Tue, Mar 9, 2010 at 11:21 AM, jaxzin <Brian.R.Jackson@espn3.com> wrote:
> 
>>
>> This is exactly the kind of feedback I'm looking for thanks, Barney.
>>
>> So its sounds like you cache the data you get from HBase in a
>> session-based
>> memory?  Are you using a Java EE HttpSession? (I'm less familiar with
>> django/rails equivalent but I'm assuming they exist)  Or are you using a
>> memory cache provider like ehcache or memcache(d)?
>>
>> Can you tell me more about your experience with latency and why you say
>> that?
>>
>>
>> Barney Frank wrote:
>> >
>> > I am using Hbase to store visitor level clickstream-like data.  At the
>> > beginning of the visitor session I retrieve all the previous session
>> data
>> > from hbase and use it within my app server and massage it a little and
>> > serve
>> > to the consumer via web services.  Where I think you will run into the
>> > most
>> > problems is your latency requirement.
>> >
>> > Just my 2 cents from a user.
>> >
>> > On Tue, Mar 9, 2010 at 9:45 AM, jaxzin <Brian.R.Jackson@espn3.com>
>> wrote:
>> >
>> >>
>> >> Hi all, I've got a question about how everyone is using HBase.  Is
>> anyone
>> >> using its as online data store to directly back a web service?
>> >>
>> >> The text-book example of a weblink HBase table suggests there would be
>> an
>> >> associated web front-end to display the information in that HBase
>> table
>> >> (ex.
>> >> search results page), but I'm having trouble finding evidence that
>> anyone
>> >> is
>> >> servicing web traffic backed directly by an HBase instance in
>> practice.
>> >>
>> >> I'm evaluating if HBase would be the right tool to provide a few
>> things
>> >> for
>> >> a large-scale web service we want to develop at ESPN and I'd really
>> like
>> >> to
>> >> get opinions and experience from people who have already been down
>> this
>> >> path.  No need to reinvent the wheel, right?
>> >>
>> >> I can tell you a little about the project goals if it helps give you
>> an
>> >> idea
>> >> of what I'm trying to design for:
>> >>
>> >> 1) Highly available (It would be a central service and an outage would
>> >> take
>> >> down everything)
>> >> 2) Low latency (1-2 ms, less is better, more isn't acceptable)
>> >> 3) High throughput (5-10k req/sec at worse case peak)
>> >> 4) Unstable traffic (ex. Sunday afternoons during football season)
>> >> 5) Small data...for now (< 10 GB of total data currently, but HBase
>> could
>> >> allow us to design differently and store more online)
>> >>
>> >> The reason I'm looking at HBase is that we've solved many of our
>> scaling
>> >> issues with the same basic concepts of HBase (sharding, flattening
>> data
>> >> to
>> >> fit in one row, throw away ACID, etc) but with home-grown software. 
>> I'd
>> >> like to adopt an active open-source project if it makes sense.
>> >>
>> >> Alternatives I'm also looking at: RDBMS fronted with Websphere eXtreme
>> >> Scale, RDBMS fronted with Hibernate/ehcache, or (the option I
>> understand
>> >> the
>> >> least right now) memcached.
>> >>
>> >> Thanks,
>> >> Brian
>> >> --
>> >> View this message in context:
>> >> http://old.nabble.com/Use-cases-of-HBase-tp27837470p27837470.html
>> >> Sent from the HBase User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Use-cases-of-HBase-tp27837470p27838006.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/Use-cases-of-HBase-tp27837470p27839035.html
Sent from the HBase User mailing list archive at Nabble.com.


Mime
View raw message