hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Puneet Gupta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5994) ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )
Date Wed, 19 Feb 2014 02:49:19 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905058#comment-13905058
] 

Puneet Gupta commented on HIVE-5994:
------------------------------------

Hi Prasanth

This is the code I Used to reproduce the issue . 
1. I am using Hive binary from "hive-0.12.0.tar.gz" 
2. I am using a old hadoop version "hadoop-core-1.0.0.jar"   --- http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core
3. In the below code if  ROWS_TO_TEST is set to 1 or >10 , the problem does not occur.

---------------------------
package hive;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hive.ql.io.orc.CompressionKind;
import org.apache.hadoop.hive.ql.io.orc.OrcFile;
import org.apache.hadoop.hive.ql.io.orc.Reader;
import org.apache.hadoop.hive.ql.io.orc.RecordReader;
import org.apache.hadoop.hive.ql.io.orc.Writer;
import org.apache.hadoop.hive.ql.io.orc.OrcFile.WriterOptions;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;

public class TestLong {

	/**
	 * @param args
	 * @throws IOException 
	 */
	public static void main(String[] args) throws IOException
	{
		int ROWS_TO_TEST =10;
		Path path = new Path("E:/Test/file.orc");
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.getLocal(conf);
		if(fs.exists(path))
			fs.delete(path,true);
		
		ObjectInspector inspector = ObjectInspectorFactory
				.getReflectionObjectInspector(MyData.class,
						ObjectInspectorFactory.ObjectInspectorOptions.JAVA);

		WriterOptions options = OrcFile.writerOptions(conf)
				.inspector(inspector).compress(CompressionKind.SNAPPY);

		Writer writer = OrcFile.createWriter(path, options);

		for (int i = 0; i < ROWS_TO_TEST; i++) {
			writer.addRow(new MyData());
		}
		writer.close();

		Reader reader = OrcFile.createReader(fs, path);
		RecordReader rows = reader.rows(null);
		Object row = null;
		while (rows.hasNext()) {
			row = rows.next(row);
			System.out.println(row);
		}
	}
	
	
	private static class MyData
	{
		long data = 4703275633953830000L ;
	}
}
-----------
OUTPUT
{112}
{112}
{112}
{112}
{112}
{112}
{112}
{112}
{112}
{112}


> ORC RLEv2 encodes wrongly for large negative BIGINTs  (64 bits )
> ----------------------------------------------------------------
>
>                 Key: HIVE-5994
>                 URL: https://issues.apache.org/jira/browse/HIVE-5994
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: orcfile
>             Fix For: 0.13.0
>
>         Attachments: HIVE-5994.1.patch
>
>
> For large negative BIGINTs, zigzag encoding will yield large value (64bit value) with
MSB set to 1. This value is interpreted as negative value in SerializationUtils.findClosestNumBits(long
value) function. This resulted in wrong computation of total number of bits required which
results in wrong encoding/decoding of values.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message