Return-Path: Delivered-To: apmail-hadoop-pig-dev-archive@www.apache.org Received: (qmail 65215 invoked from network); 14 Jan 2010 19:30:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Jan 2010 19:30:18 -0000 Received: (qmail 84559 invoked by uid 500); 14 Jan 2010 19:30:18 -0000 Delivered-To: apmail-hadoop-pig-dev-archive@hadoop.apache.org Received: (qmail 84519 invoked by uid 500); 14 Jan 2010 19:30:18 -0000 Mailing-List: contact pig-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@hadoop.apache.org Delivered-To: mailing list pig-dev@hadoop.apache.org Received: (qmail 84509 invoked by uid 99); 14 Jan 2010 19:30:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jan 2010 19:30:18 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jan 2010 19:30:16 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E4BD1234C1EE for ; Thu, 14 Jan 2010 11:29:54 -0800 (PST) Message-ID: <1961514589.244171263497394935.JavaMail.jira@brutus.apache.org> Date: Thu, 14 Jan 2010 19:29:54 +0000 (UTC) From: "Viraj Bhat (JIRA)" To: pig-dev@hadoop.apache.org Subject: [jira] Commented: (PIG-1187) UTF-8 (international code) breaks with loader when load with schema is specified In-Reply-To: <1945320210.231261263436974433.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/PIG-1187?page=3Dcom.atlassian.j= ira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D128003= 15#action_12800315 ]=20 Viraj Bhat commented on PIG-1187: --------------------------------- Hi Jeff, This is specific to the data we are using and it looks like parser failed = when it is trying to interpret some characters. As such we have tested this= with Chinese characters and it works. Viraj > UTF-8 (international code) breaks with loader when load with schema is sp= ecified > -------------------------------------------------------------------------= ------- > > Key: PIG-1187 > URL: https://issues.apache.org/jira/browse/PIG-1187 > Project: Pig > Issue Type: Bug > Affects Versions: 0.6.0 > Reporter: Viraj Bhat > Fix For: 0.6.0 > > > I have a set of Pig statements which dump an international dataset. > {code} > INPUT_OBJECT =3D load 'internationalcode'; > describe INPUT_OBJECT; > dump INPUT_OBJECT; > {code} > Sample output > (756a6196-ebcd-4789-ad2f-175e5df65d55,{(labelAa=C3=82=C3=A2=C3=80),(label= =E3=81=82=E3=81=84=E3=81=86=E3=81=88=E3=81=8A1),(label=E0=AE=9C=E0=AE=BE=E0= =AE=B0=E0=AF=8D=E0=AE=952),(labeladfadf)}) > It works and dumps results but when I use a schema for loading it fails. > {code} > INPUT_OBJECT =3D load 'internationalcode' AS (object_id:chararray, labels= : bag {T: tuple(label:chararray)}); > describe INPUT_OBJECT; > {code} > The error message is as follows:2010-01-14 02:23:27,320 FATAL org.apache.= hadoop.mapred.Child: Error running child : org.apache.pig.data.parser.Token= MgrError: Error: Bailing out of infinite loop caused by repeated empty stri= ng matches at line 1, column 21. > =09at org.apache.pig.data.parser.TextDataParserTokenManager.TokenLexicalA= ctions(TextDataParserTokenManager.java:620) > =09at org.apache.pig.data.parser.TextDataParserTokenManager.getNextToken(= TextDataParserTokenManager.java:569) > =09at org.apache.pig.data.parser.TextDataParser.jj_ntk(TextDataParser.jav= a:651) > =09at org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java= :152) > =09at org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:1= 00) > =09at org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java= :382) > =09at org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java= :42) > =09at org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8Stor= ageConverter.java:68) > =09at org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageC= onverter.java:76) > =09at org.apache.pig.backend.hadoop.executionengine.physicalLayer.express= ionOperators.POCast.getNext(POCast.java:845) > =09at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relatio= nalOperators.POForEach.processPlan(POForEach.java:250) > =09at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relatio= nalOperators.POForEach.getNext(POForEach.java:204) > =09at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMap= Base.runPipeline(PigMapBase.java:249) > =09at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMap= Base.map(PigMapBase.java:240) > =09at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMap= Only$Map.map(PigMapOnly.java:65) > =09at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > =09at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > =09at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > =09at org.apache.hadoop.mapred.Child.main(Child.java:159) > Viraj --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.