hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Damien Carol (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-12537) RLEv2 doesn't seem to work
Date Sat, 28 Nov 2015 11:05:10 GMT

     [ https://issues.apache.org/jira/browse/HIVE-12537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Damien Carol updated HIVE-12537:
--------------------------------
    Description: 
Perhaps I'm doing something wrong or is actually working as expected.

Putting 1 million constant int32 values produces an ORC file of 1MB. Surprisingly, 1 million
consecutive ints produces a much smaller file.
Code and FileDump attached.

{code}
		ObjectInspector inspector = ObjectInspectorFactory.getReflectionObjectInspector(
				Integer.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA);
		Writer w = OrcFile.createWriter(new Path("/tmp/my.orc"), 
                    OrcFile.writerOptions(new Configuration())
                                 .compress(CompressionKind.NONE)
                                 .inspector(inspector)
                                 .encodingStrategy(OrcFile.EncodingStrategy.COMPRESSION)
                                 .version(OrcFile.Version.V_0_12)
                );
		
		for (int i = 0; i < 1000000; ++i) {
			w.addRow(123);
		}
		w.close();
{code}






  was:
Perhaps I'm doing something wrong or is actually working as expected.

Putting 1 million constant int32 values produces an ORC file of 1MB. Surprisingly, 1 million
consecutive ints produces a much smaller file.
Code and FileDump attached.







> RLEv2 doesn't seem to work
> --------------------------
>
>                 Key: HIVE-12537
>                 URL: https://issues.apache.org/jira/browse/HIVE-12537
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats, ORC
>    Affects Versions: 1.2.1
>            Reporter: Bogdan Raducanu
>              Labels: orc, orcfile
>         Attachments: Main.java, orcdump.txt
>
>
> Perhaps I'm doing something wrong or is actually working as expected.
> Putting 1 million constant int32 values produces an ORC file of 1MB. Surprisingly, 1
million consecutive ints produces a much smaller file.
> Code and FileDump attached.
> {code}
> 		ObjectInspector inspector = ObjectInspectorFactory.getReflectionObjectInspector(
> 				Integer.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA);
> 		Writer w = OrcFile.createWriter(new Path("/tmp/my.orc"), 
>                     OrcFile.writerOptions(new Configuration())
>                                  .compress(CompressionKind.NONE)
>                                  .inspector(inspector)
>                                  .encodingStrategy(OrcFile.EncodingStrategy.COMPRESSION)
>                                  .version(OrcFile.Version.V_0_12)
>                 );
> 		
> 		for (int i = 0; i < 1000000; ++i) {
> 			w.addRow(123);
> 		}
> 		w.close();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message