cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-7776) Allow multiple MR jobs to concurrently write to the same column family from the same node using CqlBulkOutputFormat
Date Wed, 01 Oct 2014 17:27:35 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-7776:
--------------------------------------
    Reviewer: Piotr Kołaczkowski

> Allow multiple MR jobs to concurrently write to the same column family from the same
node using CqlBulkOutputFormat
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7776
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7776
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Paul Pak
>            Assignee: Paul Pak
>            Priority: Minor
>              Labels: cql3, hadoop
>         Attachments: trunk-7776-v1.txt
>
>
> After sstable files are written, all files in the specified output directory are loaded
(transferred) to the remote cassandra cluster. If multiple writes occur on a node to the same
table (i.e. directory), then the multiple load processes end up transferring the same sstable
files multiple times. Furthermore, if directory cleanup of successful outputs is set to occur
([CASSANDRA-7777|https://issues.apache.org/jira/browse/CASSANDRA-7777]), then there could
be errors caused by write/load contention.
> This can be simply remedied by using unique output directories for each MR job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message