cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johan Oskarsson (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-919) Add thrift request retries to Hadoop input format
Date Fri, 26 Mar 2010 09:00:28 GMT


Johan Oskarsson commented on CASSANDRA-919:

There's quite a big difference between retrying one rpc request and retrying a whole map task.
A map task has a big overhead, in the common case a whole new JVM has to be started per task.
It would also have to refetch a significant chunk of data while one rpc retry is a only a
few thousand rows. 
So if there's a short period where a few rpcs fail (due to gc:ing or similar) retrying them
instead of the task will speed up the overall job by a noticeable amount of time. It's also
worth noting that the hdfs client used by the standard MapReduce jobs has a retry mechanism.

> Add thrift request retries to Hadoop input format
> -------------------------------------------------
>                 Key: CASSANDRA-919
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>            Priority: Trivial
>             Fix For: 0.7
>         Attachments: CASSANDRA-919.patch
> In order to decrease overhead of restarting a map task and increase reliability of the
record reader we should retry the get_range_slices requests if they fail.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message