Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D6F0948F for ; Mon, 28 Nov 2011 19:00:01 +0000 (UTC) Received: (qmail 36350 invoked by uid 500); 28 Nov 2011 19:00:01 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 36317 invoked by uid 500); 28 Nov 2011 19:00:01 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 36306 invoked by uid 99); 28 Nov 2011 19:00:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Nov 2011 19:00:01 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Nov 2011 19:00:00 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 5390DA4496 for ; Mon, 28 Nov 2011 18:59:40 +0000 (UTC) Date: Mon, 28 Nov 2011 18:59:40 +0000 (UTC) From: "Brandon Williams (Issue Comment Edited) (JIRA)" To: commits@cassandra.apache.org Message-ID: <1852705778.18903.1322506780343.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <61025347.43761.1313545287173.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Issue Comment Edited] (CASSANDRA-3045) Update ColumnFamilyOutputFormat to use new bulkload API MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158635#comment-13158635 ] Brandon Williams edited comment on CASSANDRA-3045 at 11/28/11 6:58 PM: ----------------------------------------------------------------------- bq. are there any benchmarks or is there anything anecdotal about performance? Using the simplest job possible (copying a CF, map-only) I see a 20-25% gain. I suspect this is read-limited though and if you're generating the output on a hadoop cluster and loading it into a cassandra cluster (ie, not colocated), this will be even faster, but creating such a workload is a bit too much work for me to test. If anyone has an existing case like this, I'd love for them to test and chime in. One other thing though, there are far less failures using BOF on a workload that generates a lot of GC (inserting a couple of hundred columns creates quite a bit and causes failures due to UE while the nodes are CMSing.) So BOF is much 'nicer' to the cluser. was (Author: brandon.williams): bq. are there any benchmarks or is there anything anecdotal about performance? Using the simplest job possible (copying a CF, map-only) I see a 20-25% gain. I suspect this is read-limited though and if you're generating the output on a hadoop cluster and loading it into a cassandra cluster (ie, not colocated), this will be even faster, but creating such a workload is a bit too much work for me to test. If anyone has an existing case like this, I'd love for them to test and chime in. > Update ColumnFamilyOutputFormat to use new bulkload API > ------------------------------------------------------- > > Key: CASSANDRA-3045 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3045 > Project: Cassandra > Issue Type: Improvement > Components: Hadoop > Reporter: Jonathan Ellis > Assignee: Brandon Williams > Priority: Minor > Fix For: 1.1 > > Attachments: 0001-Remove-gossip-SS-requirement-from-BulkLoader.txt, 0002-Allow-DD-loading-without-yaml.txt, 0003-hadoop-output-support-for-bulk-loading.txt > > > The bulk loading interface added in CASSANDRA-1278 is a great fit for Hadoop jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira