hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10114) Split strategies for ORC
Date Mon, 06 Apr 2015 23:57:13 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482232#comment-14482232
] 

Gopal V commented on HIVE-10114:
--------------------------------

Patch LGTM  - +1.

Tested this off 10Tb and it does handle the exit in the middle cleanly, so that the system
does not get stuck when a query gets cancelled.

{code}
2015-04-06 16:51:09,536 WARN [ORC_GET_SPLITS #8] hdfs.DFSClient: DFS chooseDataNode: got #
1 IOException, will wait for 743.0372945716757 msec.
2015-04-06 16:51:09,538 WARN [ORC_GET_SPLITS #1] ipc.Client: interrupted waiting to send rpc
request to server
java.lang.InterruptedException
	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
	at java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1046)
	at org.apache.hadoop.ipc.Client.call(Client.java:1441)
	at org.apache.hadoop.ipc.Client.call(Client.java:1399)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
	at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
	at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
	at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
	at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:360)
	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:316)
	at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:237)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:924)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:836)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:702)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
{code}

> Split strategies for ORC
> ------------------------
>
>                 Key: HIVE-10114
>                 URL: https://issues.apache.org/jira/browse/HIVE-10114
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 1.2.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>         Attachments: HIVE-10114.1.patch, HIVE-10114.2.patch, HIVE-10114.3.patch, HIVE-10114.4.patch,
HIVE-10114.5.patch
>
>
> ORC split generation does not have clearly defined strategies for different scenarios
(many small orc files, few small orc files, many large files etc.). Few strategies like storing
the file footer in orc split, making entire file as a orc split already exists. This JIRA
to make the split generation simpler, support different strategies for various use cases (BI,
ETL, ACID etc.) and to lay the foundation for HIVE-7428.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message