hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Use cases of HBase
Date Tue, 09 Mar 2010 22:29:09 GMT
One thing to note is that 10GB is half the memory of a reasonable
sized machine. In fact I have seen 128 GB memcache boxes out there.

As for performance, I obviously feel HBase can be performant for real
time queries.  To get a consistent response you absolutely have to
have 95%+ caching in ram. There is no way to achieve 1-2ms responses
from disk. Throwing enough ram at the problem, I think HBase solves
this nicely and you won't have to maintain multiple architectures.

-ryan

On Tue, Mar 9, 2010 at 2:08 PM, Jonathan Gray <jlist@streamy.com> wrote:
> Brian,
>
> I would just reiterate what others have said.  If you're goal is a
> consistent 1-2ms read latency and your dataset is on the order of 10GB...
> HBase is not a good match.  It's more than what you need and you'll take
> unnecessary performance hits.
>
> I would look at some of the simpler KV-style stores out there like Tokyo
> Cabinet, Memcached, or BerkeleyDB, the in-memory ones like Redis.
>
> JG
>
> -----Original Message-----
> From: jaxzin [mailto:Brian.R.Jackson@espn3.com]
> Sent: Tuesday, March 09, 2010 12:09 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Use cases of HBase
>
>
> Gary, I looked at your presentation and it was very helpful.  But I do have
> a
> few unanswered questions from it if you wouldn't mind answering them.   How
> big is/was your cluster that handled 3k req/sec?  And what were the specs on
> each node (RAM/CPU)?
>
> When you say latency can be good, what you mean?  Is it even in the ballpark
> of 1 ms?  Because we already deal with the GC and don't expect perfect
> real-time behavior.  So that might be okay with me.
>
> P.S. I was at Hadoop World NYC and saw Ryan and Jonathan's presentation
> there but somehow mentally blocked it.  Thanks for the reminder.
>
>
>
> Gary Helmling wrote:
>>
>> Hey Brian,
>>
>> We use HBase to complement MySQL in serving activity-stream type data here
>> at Meetup.  It's handling real-time requests involved in 20-25% of our
>> page
>> views, but our latency requirements aren't as strict as yours.  For what
>> it's worth, I did a presentation on our setup which will hopefully fill in
>> some details: http://www.slideshare.net/ghelmling/hbase-at-meetup
>>
>> There are also some great presentations by Ryan Rawson and Jonathan Gray
>> on
>> how they've used HBase for realtime serving on their sites.  See the
>> presentations wiki page:
>> http://wiki.apache.org/hadoop/HBase/HBasePresentations
>>
>> Like Barney, I suspect where you'll hit some issues will be in your
>> latency
>> requirements.  Depending on how you layout your data and configure your
>> column families, your average latency may be good, but you will hit some
>> pauses as I believe reads block at times during region splits or
>> compactions
>> and memstore flushes (unless you have a fairly static data set).  Others
>> here should be able to fill in more details.
>>
>> With a relatively small dataset, you may want to look at the "in memory"
>> configuration option for your column families.
>>
>> What's your expected workload -- writes vs. reads?  types of reads you'll
>> be
>> doing: random access vs. sequential?  There are a lot of knowledgeable
>> folks
>> here to offer advice if you can give us some more insight into what you're
>> trying to build.
>>
>> --gh
>>
>>
>> On Tue, Mar 9, 2010 at 11:21 AM, jaxzin <Brian.R.Jackson@espn3.com> wrote:
>>
>>>
>>> This is exactly the kind of feedback I'm looking for thanks, Barney.
>>>
>>> So its sounds like you cache the data you get from HBase in a
>>> session-based
>>> memory?  Are you using a Java EE HttpSession? (I'm less familiar with
>>> django/rails equivalent but I'm assuming they exist)  Or are you using a
>>> memory cache provider like ehcache or memcache(d)?
>>>
>>> Can you tell me more about your experience with latency and why you say
>>> that?
>>>
>>>
>>> Barney Frank wrote:
>>> >
>>> > I am using Hbase to store visitor level clickstream-like data.  At the
>>> > beginning of the visitor session I retrieve all the previous session
>>> data
>>> > from hbase and use it within my app server and massage it a little and
>>> > serve
>>> > to the consumer via web services.  Where I think you will run into the
>>> > most
>>> > problems is your latency requirement.
>>> >
>>> > Just my 2 cents from a user.
>>> >
>>> > On Tue, Mar 9, 2010 at 9:45 AM, jaxzin <Brian.R.Jackson@espn3.com>
>>> wrote:
>>> >
>>> >>
>>> >> Hi all, I've got a question about how everyone is using HBase.  Is
>>> anyone
>>> >> using its as online data store to directly back a web service?
>>> >>
>>> >> The text-book example of a weblink HBase table suggests there would
be
>>> an
>>> >> associated web front-end to display the information in that HBase
>>> table
>>> >> (ex.
>>> >> search results page), but I'm having trouble finding evidence that
>>> anyone
>>> >> is
>>> >> servicing web traffic backed directly by an HBase instance in
>>> practice.
>>> >>
>>> >> I'm evaluating if HBase would be the right tool to provide a few
>>> things
>>> >> for
>>> >> a large-scale web service we want to develop at ESPN and I'd really
>>> like
>>> >> to
>>> >> get opinions and experience from people who have already been down
>>> this
>>> >> path.  No need to reinvent the wheel, right?
>>> >>
>>> >> I can tell you a little about the project goals if it helps give you
>>> an
>>> >> idea
>>> >> of what I'm trying to design for:
>>> >>
>>> >> 1) Highly available (It would be a central service and an outage would
>>> >> take
>>> >> down everything)
>>> >> 2) Low latency (1-2 ms, less is better, more isn't acceptable)
>>> >> 3) High throughput (5-10k req/sec at worse case peak)
>>> >> 4) Unstable traffic (ex. Sunday afternoons during football season)
>>> >> 5) Small data...for now (< 10 GB of total data currently, but HBase
>>> could
>>> >> allow us to design differently and store more online)
>>> >>
>>> >> The reason I'm looking at HBase is that we've solved many of our
>>> scaling
>>> >> issues with the same basic concepts of HBase (sharding, flattening
>>> data
>>> >> to
>>> >> fit in one row, throw away ACID, etc) but with home-grown software.
>>> I'd
>>> >> like to adopt an active open-source project if it makes sense.
>>> >>
>>> >> Alternatives I'm also looking at: RDBMS fronted with Websphere eXtreme
>>> >> Scale, RDBMS fronted with Hibernate/ehcache, or (the option I
>>> understand
>>> >> the
>>> >> least right now) memcached.
>>> >>
>>> >> Thanks,
>>> >> Brian
>>> >> --
>>> >> View this message in context:
>>> >> http://old.nabble.com/Use-cases-of-HBase-tp27837470p27837470.html
>>> >> Sent from the HBase User mailing list archive at Nabble.com.
>>> >>
>>> >>
>>> >
>>> >
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Use-cases-of-HBase-tp27837470p27838006.html
>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context:
> http://old.nabble.com/Use-cases-of-HBase-tp27837470p27841193.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>
>

Mime
View raw message