Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 25BE3D3DB for ; Fri, 14 Dec 2012 03:19:59 +0000 (UTC) Received: (qmail 16917 invoked by uid 500); 14 Dec 2012 03:19:56 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 16891 invoked by uid 500); 14 Dec 2012 03:19:56 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 16855 invoked by uid 99); 14 Dec 2012 03:19:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Dec 2012 03:19:55 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cko2223@gmail.com designates 209.85.160.66 as permitted sender) Received: from [209.85.160.66] (HELO mail-pb0-f66.google.com) (209.85.160.66) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Dec 2012 03:19:47 +0000 Received: by mail-pb0-f66.google.com with SMTP id wz17so2607965pbc.1 for ; Thu, 13 Dec 2012 19:19:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:from:content-type:x-mailer:message-id:date:to :content-transfer-encoding:mime-version; bh=3pkaFcQZ2zxPQXJV+6T7W2SKg8/vpsD8suroZ0LSjxM=; b=ykwwSmHeK9cRPVU5ZBmsLcMp41zmvRAy1wjE/A4dL+XZp6hm2RJZPMtpqplluOR5Lu DfkURzOzSlaAmTpX9v1HWrLktCymUDFIahsbmiM8VKJZKDa30wWnZSzxvXOPlLpR/xll VHHheRCj0u87IBAC83NpxJDY8zYUdp++tfQ2BGhRkiK4+lL8SWBRZwXvYAjgbU6CEmhP 2gLcccoYkN92b2pmWZxW6AvQcUBkjqLrtG+obMNqKVxzd6yyRg6e3hFHZHRNv0K4Ec9O T8zKZJJ/jxuts9AmHWT1r0hSPZGtO4ExBKQqa/7D6Dbi5xrdCZPIX5TneR8FR+xM1Yvd Dn2A== Received: by 10.66.83.134 with SMTP id q6mr12147388pay.34.1355455166830; Thu, 13 Dec 2012 19:19:26 -0800 (PST) Received: from [10.8.147.38] ([49.176.99.52]) by mx.google.com with ESMTPS id ni8sm2078908pbc.70.2012.12.13.19.19.24 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 13 Dec 2012 19:19:26 -0800 (PST) Subject: ETL Tools to transfer data from Cassandra into other relational databases From: "cko2223@gmail.com" Content-Type: text/plain; charset=us-ascii X-Mailer: iPhone Mail (9B176) Message-Id: <525E1900-3A1A-4552-99CB-E6487E0950C4@gmail.com> Date: Fri, 14 Dec 2012 14:19:15 +1100 To: "user@cassandra.apache.org" Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) X-Virus-Checked: Checked by ClamAV on apache.org We will use Cassandra as logging storage in one of our web application. The a= pplication only insert rows into Cassandra but never update or delete any ro= ws. The CF is expected to grow by about 0.5 million rows per day. =20 We need to transfer the data in Cassandra to another relational database dai= ly. Due to the large size of the CF, instead of truncating the relational ta= ble and reloading all rows into it each time, we plan to run a job to select= the "delta" rows since the last run and insert them into the relational dat= abase. =20 We know we can use Java, Pig or Hive to extract the delta rows to a flat fil= e and load the data into the target relational table. We are particularly in= terested in a process that can extract delta rows without scanning the entir= e CF. =20 Has anyone used any other ETL tools to do this kind of delta extraction from= Cassandra? We appreciate any comments and experience. =20 Thanks, Chin