Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A0F2711BF5 for ; Wed, 17 Sep 2014 01:35:55 +0000 (UTC) Received: (qmail 64570 invoked by uid 500); 17 Sep 2014 01:35:50 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 64419 invoked by uid 500); 17 Sep 2014 01:35:50 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 64398 invoked by uid 99); 17 Sep 2014 01:35:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Sep 2014 01:35:49 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=HTML_FONT_FACE_BAD,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_FILL_THIS_FORM_SHORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of garlanaganarasimha@huawei.com designates 119.145.14.65 as permitted sender) Received: from [119.145.14.65] (HELO szxga02-in.huawei.com) (119.145.14.65) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Sep 2014 01:35:45 +0000 Received: from 172.24.2.119 (EHLO SZXEML453-HUB.china.huawei.com) ([172.24.2.119]) by szxrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id BZO36428; Wed, 17 Sep 2014 09:35:22 +0800 (CST) Received: from SZXEML501-MBX.china.huawei.com ([169.254.1.124]) by SZXEML453-HUB.china.huawei.com ([10.82.67.196]) with mapi id 14.03.0158.001; Wed, 17 Sep 2014 09:35:19 +0800 From: "Naganarasimha G R (Naga)" To: "user@hadoop.apache.org" , "common-dev@hadoop.apache.org" , "bewang.tech@gmail.com" Subject: RE: Is it a bug in CombineFileSplit? Thread-Topic: Is it a bug in CombineFileSplit? Thread-Index: AQHP0f7y0fz8hwc8A0ixIel57z/5u5wEiqsw Date: Wed, 17 Sep 2014 01:35:18 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.18.148.62] Content-Type: multipart/alternative; boundary="_000_AD354F56741A1B47882A625909A59C692B400009szxeml501mbxchi_" MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Virus-Checked: Checked by ClamAV on apache.org --_000_AD354F56741A1B47882A625909A59C692B400009szxeml501mbxchi_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Wang, Seems like its a defect, are you planning to raise a defect ? if not I ca= n raise and fix .... Regards, Naga Huawei Technologies Co., Ltd. Phone: Fax: Mobile: +91 9980040283 Email: naganarasimhagr@huawei.com Huawei Technologies Co., Ltd. Bantian, Longgang District,Shenzhen 518129, P.R.China http://www.huawei.com ________________________________ From: Benyi Wang [bewang.tech@gmail.com] Sent: Wednesday, September 17, 2014 06:37 To: user@hadoop.apache.org; common-dev@hadoop.apache.org Subject: Is it a bug in CombineFileSplit? I use Spark's SerializableWritable to wrap CombineFileSplit so I can pass a= round the splits. But I ran into Serialization issues. In researching why m= y code fails, I found that this might be a bug in CombineFileSplit: CombineFileSplit doesn't serialize locations in write(DataOutput out) and d= eserialize locations in readFields(DataInput in). When I create a split in CombineFileInputFormat, locations is an array of S= tring[0], but after deserialization (default contructor, then readFields), = the locations will be null. This will lead NPE. --_000_AD354F56741A1B47882A625909A59C692B400009szxeml501mbxchi_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi Wang,
  Seems like its a defect, are you planning to raise a defect ? i= f not I can raise and fix ....

Regards,

Naga

 

Huawei Technologies Co., Lt= d.
Phone:
Fax:
Mobile:  +91 9980040283
Email: naganarasimhagr@huawei.com
Huawei Technologies Co., Ltd.
Bantian, Longgang District,Shenzhen 518129, P.R.China
http://www.huawei.com

<= /font>


From: Benyi Wang [bewang.tech@gmail.com]<= br> Sent: Wednesday, September 17, 2014 06:37
To: user@hadoop.apache.org; common-dev@hadoop.apache.org
Subject: Is it a bug in CombineFileSplit?

I use Spark's SerializableWritable to wrap CombineFileSpli= t so I can pass around the splits. But I ran into Serialization issues. In = researching why my code fails, I found that this might be a bug in CombineF= ileSplit:

CombineFileSplit doesn't serialize locations in write(DataOutput out) = and deserialize locations in readFields(DataInput in). 

When I create a split in CombineFileInputFormat, locations is an array= of String[0], but after deserialization (default contructor, then readFiel= ds), the locations will be null.

This will lead NPE.

--_000_AD354F56741A1B47882A625909A59C692B400009szxeml501mbxchi_--