Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Wed, 27 May 2015 05:04:17 +0000 (UTC)
From: "lovekesh bansal (JIRA)" <jira@apache.org>
To: dev@hive.apache.org
Message-ID: <JIRA.12832959.1432703025000.35398.1432703057075@Atlassian.JIRA>
In-Reply-To: <JIRA.12832959.1432703025000@Atlassian.JIRA>
References: <JIRA.12832959.1432703025000@Atlassian.JIRA>
 <JIRA.12832959.1432703025792@arcas>
Subject: [jira] [Created] (HIVE-10830) First column of a Hive table created
 with LazyBinaryColumnarSerDe is not read properly
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

lovekesh bansal created HIVE-10830:
--------------------------------------

             Summary: First column of a Hive table created with LazyBinaryC=
olumnarSerDe is not read properly
                 Key: HIVE-10830
                 URL: https://issues.apache.org/jira/browse/HIVE-10830
             Project: Hive
          Issue Type: Bug
            Reporter: lovekesh bansal


1. create external table platdev.table_target ( id INT, message String, sta=
te string, date string ) partitioned by (country string) row format delimit=
ed fields terminated by ',' stored as RCFILE location '/user/nikgupta/table=
_target' ;

2. insert overwrite table platdev.table_target partition(country) select ca=
se when id=3D13 then 15 else id end,message,state,date,country from platdev=
.table_base2 where id between 13 and 16; \n"

say now my table has the following data:
15=09thirteen=09delhi=09        2-12-2014=09india
14=09fourteen=09delhi=09        1-1-2014=09        india
15=09fifteen=09florida=091-1-2014=09        us
16=09sixteen=09florida=092-12-2014=09us

Now If I try to read the data with a mapreduce program, with map function a=
s given below:

public void map(LongWritable key, BytesRefArrayWritable val, Context contex=
t)
    throws IOException, InterruptedException {
   =20
    for (int i =3D 0; i < val.size(); i++) {
     BytesRefWritable bytesRefread =3D val.get(i);
     byte[] currentCell =3D Arrays.copyOfRange(bytesRefread.getData(), byte=
sRefread.getStart(), bytesRefread.getStart()+bytesRefread.getLength());
     Text currentCellStr =3D new Text(currentCell);
     System.out.println("rowText=3D"+currentCellStr=09);
    }
    context.write(NullWritable.get(), bytes);
   }


and set  the following job configuration parameters:-=20

job.setInputFormatClass(RCFileMapReduceInputFormat.class);
job.setOutputFormatClass(RCFileMapReduceOutputFormat.class);
jobConf.setInt(RCFile.COLUMN_NUMBER_CONF_STR, 5)
            =20

The output shown is as follows:
rowText=3D=0F
rowText=3Dfifteen
rowText=3Dgoa
rowText=3D2-2-2222
rowText=3Dus

But exactly the same case using the ColumnarSerDe explicitly in the table d=
efinition would give the following output:
rowText=3D=0F1
rowText=3Dfifteen
rowText=3Dgoa
rowText=3D2-2-2222
rowText=3Dus

Point is that First column value is missing.=20


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)