hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lovekesh bansal (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-10830) First column of a Hive table created with LazyBinaryColumnarSerDe is not read properly
Date Wed, 27 May 2015 05:04:17 GMT
lovekesh bansal created HIVE-10830:
--------------------------------------

             Summary: First column of a Hive table created with LazyBinaryColumnarSerDe is
not read properly
                 Key: HIVE-10830
                 URL: https://issues.apache.org/jira/browse/HIVE-10830
             Project: Hive
          Issue Type: Bug
            Reporter: lovekesh bansal


1. create external table platdev.table_target ( id INT, message String, state string, date
string ) partitioned by (country string) row format delimited fields terminated by ',' stored
as RCFILE location '/user/nikgupta/table_target' ;

2. insert overwrite table platdev.table_target partition(country) select case when id=13 then
15 else id end,message,state,date,country from platdev.table_base2 where id between 13 and
16; \n"

say now my table has the following data:
15	thirteen	delhi	        2-12-2014	india
14	fourteen	delhi	        1-1-2014	        india
15	fifteen	florida	1-1-2014	        us
16	sixteen	florida	2-12-2014	us

Now If I try to read the data with a mapreduce program, with map function as given below:

public void map(LongWritable key, BytesRefArrayWritable val, Context context)
    throws IOException, InterruptedException {
    
    for (int i = 0; i < val.size(); i++) {
     BytesRefWritable bytesRefread = val.get(i);
     byte[] currentCell = Arrays.copyOfRange(bytesRefread.getData(), bytesRefread.getStart(),
bytesRefread.getStart()+bytesRefread.getLength());
     Text currentCellStr = new Text(currentCell);
     System.out.println("rowText="+currentCellStr	);
    }
    context.write(NullWritable.get(), bytes);
   }


and set  the following job configuration parameters:- 

job.setInputFormatClass(RCFileMapReduceInputFormat.class);
job.setOutputFormatClass(RCFileMapReduceOutputFormat.class);
jobConf.setInt(RCFile.COLUMN_NUMBER_CONF_STR, 5)
             

The output shown is as follows:
rowText=
rowText=fifteen
rowText=goa
rowText=2-2-2222
rowText=us

But exactly the same case using the ColumnarSerDe explicitly in the table definition would
give the following output:
rowText=1
rowText=fifteen
rowText=goa
rowText=2-2-2222
rowText=us

Point is that First column value is missing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message