Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 072AD74F1 for ; Tue, 18 Oct 2011 20:57:33 +0000 (UTC) Received: (qmail 24861 invoked by uid 500); 18 Oct 2011 20:57:31 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 24791 invoked by uid 500); 18 Oct 2011 20:57:30 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 24783 invoked by uid 99); 18 Oct 2011 20:57:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Oct 2011 20:57:30 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a78.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Oct 2011 20:57:23 +0000 Received: from homiemail-a78.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a78.g.dreamhost.com (Postfix) with ESMTP id B4AE215C071 for ; Tue, 18 Oct 2011 13:56:47 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=JR9bI0dK2J p8U1lM/Y0867Kk6ekXycmsMYf4/P0nm0qIFc/lyY4uBtdxEyGR7/QW0CP8urmxGr PmDW+SSMECHi0WUPGVeboLMZXWR9ar3nZlvofSTF6cGb+RYpLnSznYzqnzoLXlwj LuaI5OISxt+rAUinoug/ZFbU+/gYwhL34= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=UzF/SMUyFEYPgAyH fMZretMr2B0=; b=QFeEbBDrggGJbv7ioyLPaGFmIH2qskyg5aro+toWkLyv3y6W PxvK6KBMpiU3js2aIfusntpTzTeGWFjK9EM7uQkENp8penOnmh1tuo3oJl7dBg6s cFNxRecT8DoWBC95jCFURizvXCqnINSJn4EMyTKpWidra2YG9/YM/MPsusM= Received: from [172.16.1.4] (125-236-193-159.adsl.xtra.co.nz [125.236.193.159]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a78.g.dreamhost.com (Postfix) with ESMTPSA id D0BDA15C064 for ; Tue, 18 Oct 2011 13:56:38 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: multipart/alternative; boundary="Apple-Mail=_BC18E708-F8F0-4325-B239-B0406A9F6B9B" Subject: Re: Bulk Loading Recommendations: Files over 25GBs Date: Wed, 19 Oct 2011 09:56:36 +1300 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: X-Mailer: Apple Mail (2.1251.1) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_BC18E708-F8F0-4325-B239-B0406A9F6B9B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii At that scale of data, and the fact that it's a batch job, I would go = with the bulk loading tool.=20 Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/10/2011, at 3:32 AM, Mike Rapuano wrote: > We are not currently live but testing with Cassandra. I'm looking for = recommendations on the most efficient way to load text files over 25GBs = in size to Cassandra (version 0.8.6). Our application may require us to = load 2-3 text files between 25-40GBs each a few times a week to our 3 = node cluster. I was reading this article on DataStax: = http://www.datastax.com/dev/blog/bulk-loading >=20 > Is it most efficient to create the sstables and then use sstableloader = or does anyone have other recommendations to "bulk load data"? We are = new to Cassandra and trying to work within what is generally acceptable = practices. =20 >=20 > Thanks > Mike >=20 >=20 >=20 --Apple-Mail=_BC18E708-F8F0-4325-B239-B0406A9F6B9B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii At = that scale of data, and the fact that it's a batch job, I would go with = the bulk loading = tool. 

Cheers

http://www.thelastpickle.com

On 19/10/2011, at 3:32 AM, Mike Rapuano wrote:

We are not = currently live but testing with Cassandra. I'm looking for = recommendations on the most efficient way to load text files over 25GBs = in size to Cassandra (version 0.8.6).  Our application may require = us to load 2-3 text files between 25-40GBs each a few times a week to = our 3 node cluster.  I was reading this article on DataStax:  = http://www.datastax= .com/dev/blog/bulk-loading

Is it most efficient to create the sstables and then use = sstableloader or does anyone have other recommendations to "bulk load = data"?  We are new to Cassandra and trying to work within what is = generally acceptable practices.  

Thanks
Mike




= --Apple-Mail=_BC18E708-F8F0-4325-B239-B0406A9F6B9B--