cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6665) Batching in CqlRecordWriter
Date Thu, 06 Feb 2014 15:54:10 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13893463#comment-13893463
] 

Jonathan Ellis commented on CASSANDRA-6665:
-------------------------------------------

The problem with batching is that it's *really* easy to swamp your cluster with the spikes
in workload caused by batch arrivals.  Better to smooth it out even if it's not quite as efficient.

The change from pooling by range to pooling by address would be good, though.

> Batching in CqlRecordWriter
> ---------------------------
>
>                 Key: CASSANDRA-6665
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6665
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>         Environment: Cluster of 12 nodes, each node with 256-384 vnodes. RPC threads
capped at 2048.
>            Reporter: Christian Rolf
>            Priority: Minor
>         Attachments: batchWrite.txt
>
>
> We're writing from Pig map tasks, about 20 million records of one integer each. 
> For the case of 12 nodes, with 256-384 vnodes per node, we get around 4000 threads per
mapper. This obviously overloads the nodes, since the number of RPC threads are capped, and
the write fails. 
> Also, each transfer is only in the order of a few bytes of payload. Clearly batching
is a good solution.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message