cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Franc Carter <franc.car...@sirca.org.au>
Subject Re: Largest 'sensible' value
Date Mon, 02 Apr 2012 21:38:45 GMT
On Tue, Apr 3, 2012 at 4:18 AM, Ben Coverston <ben.coverston@datastax.com>wrote:

> This is a difficult question to answer for a variety of reasons, but I'll
> give it a try, maybe it will be helpful, maybe not.
>
> The most obvious problem with this is that Thrift is buffer based, not
> streaming. That means that whatever the size of your chunk it needs to
> be received, deserialized, and processed by cassandra within a timeframe
> that we call the rpc_timeout (by default this is 10 seconds).
>

Thanks.

 I suspect that 'not streaming' is the key, and not just from the Cassandra
side - our use case has a subtle assumption of streaming on the client
side. We could chop it up in to buckets and put each one in a time ordered
column, but that the defeats the purpose of why I was considering Cassandra
- to avoid the latency of seeks in HDFS

cheers


>
> Bigger buffers mean larger allocations, larger allocations mean that the
> JVM is working harder, and  is more prone to fragmentation on the heap.
>
> With mixed workloads (lots of high latency, large requests and many very
> small low latency requests) larger buffers can also, over time, clog up the
> thread pool in a way that can cause your shorter queries to have to wait
> for your longer running queries to complete (to free up worker threads)
> making everything slow. This isn't a problem unique to Cassandra,
> everything that uses worker queues runs into some variant of this problem.
>
> As with everything else, you'll probably need to test your specific use
> case to see what 'too big' is for you.
>
> On Mon, Apr 2, 2012 at 9:23 AM, Franc Carter <franc.carter@sirca.org.au>wrote:
>
>>
>> Hi,
>>
>> We are in the early stages of thinking about a project that needs to
>> store data that will be accessed by Hadoop. One of the concerns we have is
>> around the Latency of HDFS as our use case is is not for reading all the
>> data and hence we will need custom RecordReaders etc.
>>
>> I've seen a couple of comments that you shouldn't put large chunks in to
>> a value - however 'large' is not well defined for the range of people using
>> these solutions ;-)
>>
>> Doe anyone have a rough rule of thumb for how big a single value can be
>> before we are outside sanity?
>>
>> thanks
>>
>> --
>>
>> *Franc Carter* | Systems architect | Sirca Ltd
>>  <marc.zianideferranti@sirca.org.au>
>>
>> franc.carter@sirca.org.au | www.sirca.org.au
>>
>> Tel: +61 2 9236 9118
>>
>> Level 9, 80 Clarence St, Sydney NSW 2000
>>
>> PO Box H58, Australia Square, Sydney NSW 1215
>>
>>
>
>
> --
> Ben Coverston
> DataStax -- The Apache Cassandra Company
>
>


-- 

*Franc Carter* | Systems architect | Sirca Ltd
 <marc.zianideferranti@sirca.org.au>

franc.carter@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Mime
View raw message