Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0221F11DAE for ; Fri, 25 Jul 2014 14:57:53 +0000 (UTC) Received: (qmail 47586 invoked by uid 500); 25 Jul 2014 14:57:48 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 47542 invoked by uid 500); 25 Jul 2014 14:57:48 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 47532 invoked by uid 99); 25 Jul 2014 14:57:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Jul 2014 14:57:48 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of SRS0=1jR1fm=4U=basetechnology.com=jack@yourhostingaccount.com designates 65.254.253.83 as permitted sender) Received: from [65.254.253.83] (HELO walmailout10.yourhostingaccount.com) (65.254.253.83) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Jul 2014 14:57:45 +0000 Received: from mailscan05.yourhostingaccount.com ([10.1.15.5] helo=walmailscan05.yourhostingaccount.com) by walmailout10.yourhostingaccount.com with esmtp (Exim) id 1XAgvw-00089A-5D for user@cassandra.apache.org; Fri, 25 Jul 2014 10:57:20 -0400 Received: from impout02.yourhostingaccount.com ([10.1.55.2] helo=impout02.yourhostingaccount.com) by walmailscan05.yourhostingaccount.com with esmtp (Exim) id 1XAgvw-0003PG-0m for user@cassandra.apache.org; Fri, 25 Jul 2014 10:57:20 -0400 Received: from walauthsmtp08.yourhostingaccount.com ([10.1.18.8]) by impout02.yourhostingaccount.com with NO UCE id WexK1o00A0ASqTN01exKgi; Fri, 25 Jul 2014 10:57:19 -0400 X-Authority-Analysis: v=2.0 cv=aPZyWMBm c=1 sm=1 a=UkMH5KcvGpXfM81wB0t8ug==:17 a=aQzbgH187woA:10 a=SGrIsGkL2oUA:10 a=3jZET7lWBKwA:10 a=jvYhGVW7AAAA:8 a=mV9VRH-2AAAA:8 a=PAJz7xfDAAAA:8 a=aOle84eP1vY6YOKfYWgA:9 a=QEXdDO2ut3YA:10 a=6T1ffihQY3QA:10 a=w8VyYXhoMMmJxpFpSzAA:9 a=_W_S_7VecoQA:10 a=8amoANLqcXHyoDJd6jbCBw==:117 X-EN-OrigOutIP: 10.1.18.8 X-EN-IMPSID: WexK1o00A0ASqTN01exKgi Received: from 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com ([207.237.113.28]:23915 helo=JackKrupansky14) by walauthsmtp08.yourhostingaccount.com with esmtpa (Exim) id 1XAgvv-0007Fz-Qt for user@cassandra.apache.org; Fri, 25 Jul 2014 10:57:19 -0400 Message-ID: From: "Jack Krupansky" To: References: In-Reply-To: Subject: Re: read huge data from CSV and write into Cassandra Date: Fri, 25 Jul 2014 10:57:18 -0400 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_3E61_01CFA7F7.33A6BDC0" X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 16.4.3528.331 X-MimeOLE: Produced By Microsoft MimeOLE V16.4.3528.331 X-EN-UserInfo: e0a4b55451ed9f27313ebf02e3d4348d:931c98230c6409dcc37fa7e93b490c27 X-EN-AuthUser: jack@basetechnology.com Sender: "Jack Krupansky" X-EN-OrigIP: 207.237.113.28 X-EN-OrigHost: 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. ------=_NextPart_000_3E61_01CFA7F7.33A6BDC0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Read the csv file using a Java app and then index the rows using the = Cassandra Java driver with multiple, parallel input streams. Oh, and make sure to provision your cluster with enough nodes to handle = your desired ingestion and query rates. Do a proof of concept with a six = node cluster with RF=3D2 to see what ingestion and query rates you can = get for a fraction of your data and then scale from there. Although a = 12-node cluster with RF=3D3 would be more realistic. RF=3D2 is not for = production =E2=80=93 doesn=E2=80=99t permit any failures, while RF=3D3 = permits quorum operations with a single node failure. But RF=3D2 at = least lets you test with a more realistic scenario of coordinator nodes = and inter-node traffic. And if your total row count does manage to fit on one machine (or three = nodes with RF=3D3), at least make sure you have enough CPU cores and I/O = bandwidth to handle your desired ingestion and query rate. -- Jack Krupansky From: Akshay Ballarpure=20 Sent: Friday, July 25, 2014 5:26 AM To: user@cassandra.apache.org=20 Subject: read huge data from CSV and write into Cassandra How to read data from large CSV file which is having 100+ columns and = millions of rows and inserting into Cassandra every 1 minute.=20 Thanks & Regards Akshay Ghanshyam Ballarpure Tata Consultancy Services Cell:- 9985084075 Mailto: akshay.ballarpure@tcs.com Website: http://www.tcs.com ____________________________________________ Experience certainty. IT Services Business Solutions Consulting ____________________________________________=20 =3D=3D=3D=3D=3D-----=3D=3D=3D=3D=3D-----=3D=3D=3D=3D=3D Notice: The information contained in this e-mail message and/or attachments to it may contain=20 confidential or privileged information. If you are=20 not the intended recipient, any dissemination, use,=20 review, distribution, printing or copying of the=20 information contained in this e-mail message=20 and/or attachments to it are strictly prohibited. If=20 you have received this communication in error,=20 please notify us by reply e-mail or telephone and=20 immediately and permanently delete the message=20 and any attachments. Thank you ------=_NextPart_000_3E61_01CFA7F7.33A6BDC0 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable
Read the csv file using a Java app and then index the rows using = the=20 Cassandra Java driver with multiple, parallel input streams.
 
Oh, and make sure to provision your cluster with enough nodes to = handle=20 your desired ingestion and query rates. Do a proof of concept with a six = node=20 cluster with RF=3D2 to see what ingestion and query rates you can get = for a=20 fraction of your data and then scale from there. Although a 12-node = cluster with=20 RF=3D3 would be more realistic. RF=3D2 is not for production =E2=80=93 = doesn=E2=80=99t permit any=20 failures, while RF=3D3 permits quorum operations with a single node = failure. But=20 RF=3D2 at least lets you test with a more realistic scenario of = coordinator nodes=20 and inter-node traffic.
 
And if your total row count does manage to fit on one machine (or = three=20 nodes with RF=3D3), at least make sure you have enough CPU cores and I/O = bandwidth=20 to handle your desired ingestion and query rate.
 
-- Jack=20 Krupansky
 
Sent: Friday, July 25, 2014 5:26 AM
Subject: read huge data from CSV and write into=20 Cassandra
 
How to read data from large CSV file which is = having 100+=20 columns and millions of rows and inserting into Cassandra every 1 = minute.=20

Thanks & Regards
Akshay = Ghanshyam=20 Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto:=20 akshay.ballarpure@tcs.com
Website:
http://www.tcs.com
____________________________________________
Exp= erience=20 certainty.        IT=20 Services
          &= nbsp;           =20 Business=20 Solutions
          =              = Consulting
____________________________________________
=20

=3D=3D=3D=3D=3D-----=3D=3D=3D=3D=3D-----=3D=3D=3D=3D=3D
Notice: = The information contained in this=20 e-mail
message and/or attachments to it may contain
confidential = or=20 privileged information. If you are
not the intended recipient, any=20 dissemination, use,
review, distribution, printing or copying of the =
information contained in this e-mail message
and/or attachments = to it=20 are strictly prohibited. If
you have received this communication in = error,=20
please notify us by reply e-mail or telephone and
immediately = and=20 permanently delete the message
and any attachments. Thank=20 you

------=_NextPart_000_3E61_01CFA7F7.33A6BDC0--