From user-return-33940-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon May 6 16:29:41 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DF452F976 for ; Mon, 6 May 2013 16:29:41 +0000 (UTC) Received: (qmail 64504 invoked by uid 500); 6 May 2013 16:29:39 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 64486 invoked by uid 500); 6 May 2013 16:29:39 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 64478 invoked by uid 99); 6 May 2013 16:29:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 May 2013 16:29:39 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of cscetbon.ext@orange.com designates 193.251.215.92 as permitted sender) Received: from [193.251.215.92] (HELO relais-inet.francetelecom.com) (193.251.215.92) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 May 2013 16:29:33 +0000 Received: from omfedm07.si.francetelecom.fr (unknown [xx.xx.xx.3]) by omfedm10.si.francetelecom.fr (ESMTP service) with ESMTP id C5FB426426A for ; Mon, 6 May 2013 18:29:12 +0200 (CEST) Received: from PUEXCH71.nanterre.francetelecom.fr (unknown [10.101.44.33]) by omfedm07.si.francetelecom.fr (ESMTP service) with ESMTP id AA6624C06F for ; Mon, 6 May 2013 18:29:12 +0200 (CEST) Received: from PUEXCB2E.nanterre.francetelecom.fr ([10.101.44.54]) by PUEXCH71.nanterre.francetelecom.fr ([10.101.44.33]) with mapi; Mon, 6 May 2013 18:29:12 +0200 From: To: "user@cassandra.apache.org" Date: Mon, 6 May 2013 18:29:21 +0200 Subject: Re: Hadoop jobs and data locality Thread-Topic: Hadoop jobs and data locality Thread-Index: Ac5KdtdCa+rgM9W+S62E40r9NmE1Vg== Message-ID: <9944_1367857752_5187DA58_9944_2172_1_1BBE2996-9636-4812-896A-6408F9F6900F@orange.com> References: <28783_1367594710_5183D6D6_28783_1472_1_4E7F1A91-4033-4B86-B59F-94E124C39E18@orange.com> <449931367736309@web14h.yandex.ru> In-Reply-To: <449931367736309@web14h.yandex.ru> Accept-Language: fr-FR Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: fr-FR Content-Type: multipart/alternative; boundary="_000_1BBE299696364812896A6408F9F6900Forangecom_" MIME-Version: 1.0 X-PMX-Version: 5.6.1.2065439, Antispam-Engine: 2.7.2.376379, Antispam-Data: 2013.5.6.160333 X-Virus-Checked: Checked by ClamAV on apache.org --_000_1BBE299696364812896A6408F9F6900Forangecom_ Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: quoted-printable Unfortunately I've just tried with a new cluster with RandomPartitioner and= it doesn't work better : it may come from hadoop/pig modifications : 18:02:53|elia:hadoop cyril$ git diff --stat cassandra-1.1.5..cassandra-1.2.= 1 . .../apache/cassandra/hadoop/BulkOutputFormat.java | 27 +-- .../apache/cassandra/hadoop/BulkRecordWriter.java | 55 +++--- .../cassandra/hadoop/ColumnFamilyInputFormat.java | 102 ++++++---- .../cassandra/hadoop/ColumnFamilyOutputFormat.java | 31 ++-- .../cassandra/hadoop/ColumnFamilyRecordReader.java | 76 ++++---- .../cassandra/hadoop/ColumnFamilyRecordWriter.java | 24 +-- .../apache/cassandra/hadoop/ColumnFamilySplit.java | 32 ++-- .../org/apache/cassandra/hadoop/ConfigHelper.java | 73 ++++++-- .../cassandra/hadoop/pig/CassandraStorage.java | 214 +++++++++++++---= ---- 9 files changed, 380 insertions(+), 254 deletions(-) Can anyone help on getting more mapper running ? Maybe we should open a bug= report ? -- Cyril SCETBON On May 5, 2013, at 8:45 AM, Shamim > wrote: Hello, We have also came across this issue in our dev environment, when we upgra= de Cassandra from 1.1.5 to 1.2.1 version. I have mentioned this issue in fe= w times in this forum but haven't got any answer yet. For quick work around= you can use pig.splitCombination false in your pig script to avoid this is= sue, but it will make one of your task with a very big amount of data. I ca= n't figure out why this happening in newer version of Cassandra, strongly g= uess some thing goes wrong in Cassandra implementation of LoadFunc or in Mu= rmur3Partition (it's my guess). Here is my earliar post http://www.mail-archive.com/user@cassandra.apache.org/msg28016.html http://www.mail-archive.com/user@cassandra.apache.org/msg29425.html Any comment from authors will be highly appreciated P.S. please keep me in touch with any solution or hints. -- Best regards Shamim A. 03.05.2013, 19:25, "cscetbon.ext@orange.com" : Hi, I'm using Pig to calculate the sum of a columns from a columnfamily (scan o= f all rows) and I've read that input data locality is supported at http://w= iki.apache.org/cassandra/HadoopSupport However when I execute my Pig script Hadoop assigns only one mapper to the = task and not one mapper on each node (replication factor =3D 1). FYI, I've= 8 mappers available (2 per node). Is there anything that can disable the data locality feature ? Thanks -- Cyril SCETBON ___________________________________________________________________________= ______________________________________________ Ce message et ses pieces joi= ntes peuvent contenir des informations confidentielles ou privilegiees et n= e doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si= vous avez recu ce message par erreur, veuillez le signaler a l'expediteur = et le detruire ainsi que les pieces jointes. Les messages electroniques eta= nt susceptibles d'alteration, France Telecom - Orange decline toute respons= abilite si ce message a ete altere, deforme ou falsifie. Merci. This messag= e and its attachments may contain confidential or privileged information th= at may be protected by law; they should not be distributed, used or copied = without authorisation. If you have received this email in error, please not= ify the sender and delete this message and its attachments. As emails may b= e altered, France Telecom - Orange is not liable for messages that have bee= n modified, changed or falsified. Thank you. ___________________________________________________________________________= ______________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confiden= tielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu= ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages el= ectroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete al= tere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged inf= ormation that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and dele= te this message and its attachments. As emails may be altered, France Telecom - Orange is not liable for message= s that have been modified, changed or falsified. Thank you. --_000_1BBE299696364812896A6408F9F6900Forangecom_ Content-Type: text/html; charset="koi8-r" Content-Transfer-Encoding: quoted-printable Unfortunately I've just tri= ed with a new cluster with RandomPartitioner and it doesn't work better :
it may come from hadoop/pig modifications :
18:02:53|elia:hadoop cyril$ git diff --stat cassandra-1.1.= 5..cassandra-1.2.1 .
 .../apache/cassandra/hadoop/BulkOutput= Format.java  |   27 +--
 .../apache/cassandra/hado= op/BulkRecordWriter.java  |   55 +++---
 .../cassa= ndra/hadoop/ColumnFamilyInputFormat.java  |  102 ++++++----
=
 .../cassandra/hadoop/ColumnFamilyOutputFormat.java |   31 += +--
 .../cassandra/hadoop/ColumnFamilyRecordReader.java | &n= bsp; 76 ++++----
 .../cassandra/hadoop/ColumnFamilyRecordWri= ter.java |   24 +--
 .../apache/cassandra/hadoop/Column= FamilySplit.java |   32 ++--
 .../org/apache/cassandra/= hadoop/ConfigHelper.java  |   73 ++++++--
 .../cas= sandra/hadoop/pig/CassandraStorage.java     |  214 +++++++++= ++++-------
 9 files changed, 380 insertions(+), 254 deletio= ns(-)

Can anyone help on getting more mapper= running ? Maybe we should open a bug report ?
-- 
<= div class=3D"" style=3D"color: rgb(0, 0, 0); font-family: Helvetica; font-s= ize: medium; font-style: normal; font-variant: normal; font-weight: normal;= letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webk= it-auto; text-indent: 0px; text-transform: none; white-space: normal; widow= s: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-strok= e-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line= -break: after-white-space; ">Cyril SCETBON

On May 5, 2013, at 8:45 AM, Shamim <srecon@yandex.ru> wrote:

Hello,
  We have also cam= e across this issue in our dev environment, when we upgrade Cassandra from = 1.1.5 to 1.2.1 version. I have mentioned this issue in few times in this fo= rum but haven't got any answer yet. For quick work around you can use pig.s= plitCombination false in your pig script to avoid this issue, but it will m= ake one of your task with a very big amount of data. I can't figure out why= this happening in newer version of Cassandra, strongly guess some thing go= es wrong in Cassandra implementation of LoadFunc or in Murmur3Partition (it= 's my guess).
Here is my earliar post
http://www.mail-archive.co= m/user@cassandra.apache.org/msg28016.html
http://www.mail-archive.co= m/user@cassandra.apache.org/msg29425.html

Any comment from authors w= ill be highly appreciated
P.S. please keep me in touch with any solution= or hints.

--
Best regards
  Shamim A.



03= .05.2013, 19:25, "cscetbon.ext@orange.com" <cscetbon.ext@orange.com>:=
Hi,
I'm using Pig to calculate the sum of = a columns from a columnfamily (scan of all rows) and I've read that input d= ata locality is supported at http://wiki.apache.org/cassandra/HadoopSu= pport
However when I execute my Pig script Hadoop assigns only one mappe= r to the task and not one mapper on each node (replication factor =3D 1). &= nbsp;FYI, I've 8 mappers available (2 per node).
Is there anything that = can disable the data locality feature ?

Thanks
--
Cyril SCETBO= N

__________________________________________________________________= _______________________________________________________ Ce message et ses p= ieces jointes peuvent contenir des informations confidentielles ou privileg= iees et ne doivent donc pas etre diffuses, exploites ou copies sans autoris= ation. Si vous avez recu ce message par erreur, veuillez le signaler a l'ex= pediteur et le detruire ainsi que les pieces jointes. Les messages electron= iques etant susceptibles d'alteration, France Telecom - Orange decline tout= e responsabilite si ce message a ete altere, deforme ou falsifie. Merci. Th= is message and its attachments may contain confidential or privileged infor= mation that may be protected by law; they should not be distributed, used o= r copied without authorisation. If you have received this email in error, p= lease notify the sender and delete this message and its attachments. As ema= ils may be altered, France Telecom - Orange is not liable for messages that= have been modified, changed or falsified. Thank you.

__________________________________________________=
_______________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confiden=
tielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu=
 ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages el=
ectroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete al=
tere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged inf=
ormation that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and dele=
te this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for message=
s that have been modified, changed or falsified.
Thank you.
= --_000_1BBE299696364812896A6408F9F6900Forangecom_--