From user-return-21948-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Wed Nov 2 07:47:43 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B559999E4 for ; Wed, 2 Nov 2011 07:47:43 +0000 (UTC) Received: (qmail 97703 invoked by uid 500); 2 Nov 2011 07:47:41 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 97096 invoked by uid 500); 2 Nov 2011 07:47:37 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 97084 invoked by uid 99); 2 Nov 2011 07:47:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2011 07:47:36 +0000 X-ASF-Spam-Status: No, hits=1.6 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of boneill42@gmail.com designates 209.85.220.172 as permitted sender) Received: from [209.85.220.172] (HELO mail-vx0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2011 07:47:29 +0000 Received: by vcbfl11 with SMTP id fl11so573578vcb.31 for ; Wed, 02 Nov 2011 00:47:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:user-agent:date:subject:from:to:message-id:thread-topic :in-reply-to:mime-version:content-type; bh=86pUDm6+CmLs91nuqui317XyAC8kZGuhvmRwMm8C9I8=; b=ts5l9cd6Ahy4rw5yq1HsPQ8XlQIREhk87fL7IsEBjpZSjFNDOD05Fu9ntmHd1IQEvN CK+9uIAuodCEkFmHp7t6SJ57NFLRB984OZll3dDOwfWK7EPu51OVof5DU8to3xNQz1q/ xkRQoJK13pKaKRnAJ6qgVVqXoEB1m05Smqp14= Received: by 10.220.213.132 with SMTP id gw4mr260168vcb.52.1320220028978; Wed, 02 Nov 2011 00:47:08 -0700 (PDT) Received: from [192.168.0.102] (c-68-63-149-124.hsd1.pa.comcast.net. [68.63.149.124]) by mx.google.com with ESMTPS id i18sm2241003vdu.14.2011.11.02.00.47.05 (version=SSLv3 cipher=OTHER); Wed, 02 Nov 2011 00:47:06 -0700 (PDT) Sender: "Brian O'Neill" User-Agent: Microsoft-MacOutlook/14.13.0.110805 Date: Wed, 02 Nov 2011 03:47:02 -0400 Subject: Re: Tool for SQL -> Cassandra data movement From: Brian O'Neill To: Message-ID: Thread-Topic: Tool for SQL -> Cassandra data movement In-Reply-To: <4EB037D8.4040004@bnl.gov> Mime-version: 1.0 Content-type: multipart/alternative; boundary="B_3403050425_59487" X-Virus-Checked: Checked by ClamAV on apache.org > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --B_3403050425_59487 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit COTs/Open-Source ETL tools exist to do this. (Talend, Pentaho, CloverETL, etc.) With those, you should be able to do this without writing any code. All of the tools can read from a SQL database. Then you just need to push the data into Cassandra. Many of the ETL tools support web services, which is why I suggested a REST layer for Cassandra might be handy. Using the ETL tool, you could push the data into Cassandra as JSON over REST. (If you want, give Virgil a try) I haven't tried, but you might also be able to coax the ETL tools to use CQL. Some of the ETL tools are Map/Reduce friendly (more or less) and can distribute the job over a cluster. But if you have a lot of data, you may also want to look at Pig and/or Map/Reduce directly. If you stage the CSV/JSON file on HDFS, then a simple Map/Reduce job can load the data directly into Cassandra. (using a ColumnFamilyOutput format) We are solving this problem right now, so I'll report back. -brian ---- Brian O'Neill Lead Architect, Software Development Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406 p: 215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/ From: Maxim Potekhin Organization: Brookhaven National Laboratory Reply-To: Date: Tue, 01 Nov 2011 14:18:00 -0400 To: Subject: Re: Tool for SQL -> Cassandra data movement Just a short comment -- we are going the CSV way as well because of its compactness and extreme portability. The CSV files are kept in the cloud as backup. They can also find other uses. JSON would work as well, but it would be at least twice as large in size. Maxim On 9/22/2011 1:25 PM, Nehal Mehta wrote: > We are trying to carry out same stuff, but instead of migrating into JSON, we > are exporting into CSV and than importing CSV into Cassandra. Which DB are > you currently using? > > Thanks, > Nehal Mehta. > > > 2011/9/22 Radim Kolar > >> I need tool which is able to dump tables via JDBC into JSON format for >> cassandra import. I am pretty sure that somebody already wrote that. >> >> Are there tools which can do direct JDBC -> cassandra import? >> > > > --B_3403050425_59487 Content-type: text/html; charset="US-ASCII" Content-transfer-encoding: quoted-printable

COT= s/Open-Source ETL tools exist to do this.   (Talend, Pentaho, CloverETL= , etc.)
With those, you should be able to do this without writing = any code.

All of the tools can read from a SQL data= base.  Then you just need to push the data into Cassandra.   Many = of the ETL tools support web services, which is why I suggested a REST layer= for Cassandra might be handy.  Using the ETL tool, you could push the = data into Cassandra as JSON over REST.  (If you want, give Virgil a try)  <= /div>

I haven't tried, but you might also be able to coax= the ETL tools to use CQL.  

Some of the ETL t= ools are Map/Reduce friendly (more or less) and can distribute the job over = a cluster.  But if you have a lot of data, you may also want to look at= Pig and/or Map/Reduce directly.   If you stage the CSV/JSON file on HD= FS, then a simple Map/Reduce job can load the data directly into Cassandra. = (using a ColumnFamilyOutput format)

We are solving = this problem right now, so I'll report back.

-brian=

----&= nbsp;
Brian O'Neill
Lead Archite= ct, Software Development
Health Ma= rket Science | 2700 Horizon Drive | King of Prussia, PA 19406
p: 215.588.6024
=
blog: <= a href=3D"http://brianoneill.blogspot.com">http://brianoneill.blogspot.com= /


=
From: Maxim Potekhin <potekhin@bnl.gov>
Organization: Brookhaven National Laboratory
Reply-To: <user@cassandra.apache.org>
Date: Tue, 01 Nov 2011 14:18:00 -0400
To: <user@cass= andra.apache.org>
Subject: = Re: Tool for SQL -> Cassandra data movement

=
Just a short comment -- we are going the CSV way as well because of its compactness and extreme portability.
The CSV files are kept in the cloud as backup. They can also find other uses. JSON would work as well, but
it would be at least twice as large in size.

Maxim

On 9/22/2011 1:25 PM, Nehal Mehta wrote:
We are trying to carry out same stuff, but in= stead of migrating into JSON, we are exporting into CSV and than importing CSV into Cassandra.  Which DB are you currently using?

Thanks,
Nehal Mehta.

2011/9/22 Radim Kolar <hsn@sendmail.cz>
I need tool which is able to dump tables via JDBC into JSON format for cassandra import. I am pretty sure that somebody already wrote that.

Are there tools which can do direct JDBC -> cassandra import?


--B_3403050425_59487--