hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Tewari <amittew...@gmail.com>
Subject Re: Error using ORC Format with Hive
Date Sat, 05 Apr 2014 07:48:14 GMT
Thanks for the reply. I did solve protobuf issue by upgrading to 2.5 but then hive 0.12 also
started showing the same issue as 0.13 and 0.14

I was working through  cli

Turns out issue was due to space available (not) to data node. Let me elaborate for others
in the list. 

I had about 2GB available on the partition where data node directory was configured (the name
node and data node space was on the same directory tree but different directories, off course).
I inserted kv1.txt (few KBs) to table#1 (stored as textfile) and then tried to "insert into
table#2 select * table#1". Table#2 was stored as Orc.  It was difficult for me to guess that
converted Orc data would be too big to fit in 2GB.  Especially when data node logs did not
have any error. Nor was there reserve configured for HDFS. I still don't know why it needs
so much space however I could reproduce the error simply by pushing a 300MB file to HDFS "hdfs
dfs -put ". Thus realizing that it's a space issue. Migrated datanode  to a bigger partition
and everything is fine now. 

On a separate note I am not seeing any significant query time improvement by pushing data
into ORC. About 25% yeah but no where close to multiples I was hoping. I changed the striping
to 4MB. Tried creating index every 10k rows. Inserted 6 million rows and did many different
type of queries. Any ideas people what I might be missing  ? 

Amit 

Sent from my mobile device, please excuse the typos

> On Apr 4, 2014, at 8:21 PM, Bryan Jeffrey <bryan.jeffrey@gmail.com> wrote:
> 
> Amit,
> 
> Are you executing your select for conversion to orc via beeline, or hive cli? From looking
at your logs, it appears that you do not have permissions in hdfs to write the resultant orc
data. Check permissions in hdfs to ensure that your user has write permissions to write to
hive warehouse.
> 
> I forwarded you a previous thread regarding hive 12 protobuf issues.
> 
> Regards,
> 
> Bryan Jeffrey
> 
> On Apr 4, 2014 8:14 PM, "Amit Tewari" <amittewari@gmail.com> wrote:
> I checked out and build hive 0.13. Tried with same results. i.e. 
> eRpcServer.addBlock(NameNodeRpcServer.java:555)
>     at File /tmp/hive-hduser/hive_2014-04-04_20-34-43_550_7470522328893486504-1/_task_tmp.-ext-10002/_tmp.000000_3
could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s)
running and no node(s) are excluded in this     operation.
> 
> 
> 
> I also tried it with the release version of hive 0.12 and that gave me a different error.
Related to protobuffer incompatibility (pasted below)
> 
> So at this point I can't run even the basic use case with ORC storage..
> 
> Any pointers would be very helpful.
> 
> Amit
> 
> Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
>     at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:240)
>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
> 
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> Caused by: java.lang.UnsupportedOperationException: This is supposed to be overridden
by subclasses.
>     at com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
>     at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.getSerializedSize(OrcProto.java:3046)
>     at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>     at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>     at org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndexEntry.getSerializedSize(OrcProto.java:4129)
>     at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>     at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>     at org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndex.getSerializedSize(OrcProto.java:4641)
>     at com.google.protobuf.AbstractMessageLite.writeTo(AbstractMessageLite.java:75)
>     at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:548)
>     at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1328)
>     at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1699)
>     at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:1868)
>     at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:95)
>     at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:181)
>     at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:866)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:596)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
>     at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
> 
> Amit
> 
> 
>> On 4/4/14 2:28 PM, Amit Tewari wrote:
>> Hi All,
>> 
>> I am just trying to do some simple tests to see speedup in hive query with Hive 0.14
(trunk version this morning). Just tried to use sample test case to start with. First wanted
to see how much I can speed up using ORC format. 
>> 
>> However for some reason I can't insert data into the table with ORC format. It fails
with Exception "File <filename> could only be replicated to 0 nodes instead of minReplication
(=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation" 
>> 
>> I can however run inserting data into text table without any issue. 
>> 
>> I have included the step below. 
>> 
>> Any pointers would be appreciated. 
>> 
>> Amit
>> 
>> 
>> 
>> I have a single node setup with minimal settings. JPS output is as follows 
>> $ jps
>> 9823 NameNode
>> 12172 JobHistoryServer
>> 9903 DataNode
>> 14895 Jps
>> 11796 ResourceManager
>> 12034 NodeManager
>> Running Hadoop 0.2.2 with Yarn.
>> 
>> 
>> 
>> Step1
>> 
>> CREATE TABLE pokes (foo INT, bar STRING);
>> 
>> Step 2
>> 
>> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
>> 
>> Step 3
>> CREATE TABLE pokes_1 (foo INT, bar STRING) 
>> 
>> Step 4
>> 
>> Insert into table pokes_1 select * from pokes;
>> 
>> Step 5.
>> 
>> CREATE TABLE pokes_orc (foo INT, bar STRING) stored as orc;
>> 
>> Step 6. 
>> 
>> insert into pokes_orc select * from pokes; <__FAILED__ with Exception below >
>> 
>> eRpcServer.addBlock(NameNodeRpcServer.java:555)
>>     at File /tmp/hive-hduser/hive_2014-04-04_20-34-43_550_7470522328893486504-1/_task_tmp.-ext-10002/_tmp.000000_3
could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s)
running and no node(s) are excluded in this operation.
>>     at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>     at org.apache.hadoop.hdfs.server.namenode.NameNodorg.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>     at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>> 
>>     at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:168)
>>     at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:843)
>>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
>>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
>>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
>>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
>>     at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
>>     ... 8 more
>> 
>> 
>> Step 7
>> 
>> Insert overwrite table pokes_1 select * from pokes; <Success>
> 

Mime
View raw message