hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sleiman Jneidi <jneidi.slei...@gmail.com>
Subject Re: Time series scheme design
Date Wed, 01 Jul 2015 23:09:41 GMT
Thanks Stack, looks like a good read.
Vladimir, I called it time-series because (ordering by time/ filtering by
the tweet owner) is the goal. To answer your questions, lets for now assume
that its not as massive as Twitter because otherwise it will be very
complicated as you mentioned. So

1. How many updates per second in the system? We never mutate data, we
write 500 tweets/sec.
2. How many users? 10000
3. Average # of followers per user? 250 users.

Even with these modest numbers, the schema is still tricky to be highly
optimised for reads. Any thoughts?
Thanks.


On Wed, Jul 1, 2015 at 11:36 PM, Vladimir Rodionov <vladrodionov@gmail.com>
wrote:

> That is not time-series modeling issue per se ... You can't come up with
> anything
> until you get the basic performance/load SLA numbers
>
> 1. How many updates per second in the system?
> 2. How many users?
> 3. Average # of followers per user with percentiles up to 99.9%
>
> Twitter architecture to support user-follower relationships is not based on
> a single data store and
> much more complex. Therefore, I think, in this case everything will depend
> on ## 1. 2. 3.
>
> Scale matters.
>
> -Vlad
>
>
> On Wed, Jul 1, 2015 at 2:17 PM, Stack <stack@duboce.net> wrote:
>
> > To add to Amandeep's pointer, this one is good for concerns modeling
> > timeseries:
> > https://cloud.google.com/bigtable/pdf/CloudBigtableTimeSeries.pdf
> >
> > St.Ack
> >
> > On Wed, Jul 1, 2015 at 11:53 AM, Sleiman Jneidi <
> jneidi.sleiman@gmail.com>
> > wrote:
> >
> > > Hello everyone, I am working on a scheme design for a time series
> > database.
> > > Something very similar to Twitter where people can follow each other
> and
> > > see their posts. I've looked at opentsdb but I think my problem is more
> > > complicated because I don't have the leading "metricid" in the row key.
> > > I've made several attempts so far but I am not happy with the
> > performance.
> > >
> > > 1. Md5(user)+timestamp . The problem with is when I want to query the
> > feed,
> > > I have to do a scan with the highest user ( alphabetical order) and the
> > > lowest and then add column column filter. Getting the next batch is
> hard.
> > >
> > > 2. Md5(user)+day and then put the posts of the day in the columns with
> > > timestamp in the qualifier name. Not optimal, getting the next batch is
> > > hard.
> > >
> > > So... What do you guys think? Any ideas for making this efficient or
> > > possible?
> > >
> > > Thanks for your time in reading this.
> > > Sleiman
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message