cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Roth <benjamin.r...@jaumo.com>
Subject Re: parallel processing - splitting data
Date Thu, 19 Jan 2017 12:19:37 GMT
If you have 4 Nodes with RF 4 then all data is on every node. So you can
just slice the whole token range into 4 pieces and let each node process 1
slice.
Determining local ranges also only helps if you read with CL_ONE.

2017-01-19 13:05 GMT+01:00 Frank Hughes <frankhughes782@gmail.com>:

> Hello there,
>
> I'm running a 4 node cluster of Cassandra 3.9 with a replication factor of
> 4.
>
> I want to be able to run a java process on each node only selecting a 25%
> of the data on each node,
> so i can process all of the data in parallel on each node.
>
> What is the best way to do this with the java driver ?
>
> I was assuming I could retrieve the token ranges for each node and page
> through the data using these ranges, but this includes the replicated data.
> I was hoping there was away of only selecting the data that a node is
> responsible for and avoiding the replicated data.
>
> Many thanks for any help and guidance,
>
> Frank Hughes
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Mime
View raw message