incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin <colpcl...@gmail.com>
Subject Re: Data model for streaming a large table in real time.
Date Sat, 07 Jun 2014 21:45:48 GMT
The add seconds to the bucket.  Also, the data will get cached-it's not going to hit disk on
every read.

Look at the key cache settings on the table.  Also, in 2.1 you have even more control over
caching.

--
Colin
320-221-9531


> On Jun 7, 2014, at 4:30 PM, Kevin Burton <burton@spinn3r.com> wrote:
> 
> 
>> On Sat, Jun 7, 2014 at 1:34 PM, Colin <colpclark@gmail.com> wrote:
>> Maybe it makes sense to describe what you're trying to accomplish in more detail.
> 
> Essentially , I'm appending writes of recent data by our crawler and sending that data
to our customers.
>  
> They need to sync to up to date writes…we need to get them writes within seconds. 
> 
>> A common bucketing approach is along the lines of year, month, day, hour, minute,
etc and then use a timeuuid as a cluster column.  
> 
> I mean that is acceptable.. but that means for that 1 minute interval, all writes are
going to that one node (and its replicas)
> 
> So that means the total cluster throughput is bottlenecked on the max disk throughput.
> 
> Same thing for reads… unless our customers are lagged, they are all going to stampede
and ALL of them are going to read data from one node, in a one minute timeframe.
> 
> That's no fun..  that will easily DoS our cluster.
>  
>> Depending upon the semantics of the transport protocol you plan on utilizing, either
the client code keep track of pagination, or the app server could, if you utilized some type
of request/reply/ack flow.  You could keep sequence numbers for each client, and begin streaming
data to them or allowing query upon reconnect, etc.
>> 
>> But again, more details of the use case might prove useful.
> 
> I think if we were to just 100 buckets it would probably work just fine.  We're probably
not going to be more than 100 nodes in the next year and if we are that's still reasonable
performance.  
> 
> I mean if each box has a 400GB SSD that's 40TB of VERY fast data. 
> 
> Kevin
> 
> -- 
> Founder/CEO Spinn3r.com
> Location: San Francisco, CA
> Skype: burtonator
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.

Mime
View raw message