hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Zeng <john.z...@dataguise.com>
Subject ORC file block size
Date Fri, 25 Jul 2014 22:53:33 GMT
Hi,  owner of org.apache.hadoop.hive.ql.io.orc.WriterImpl.java:

When writing a ORC file using following code piece:

               Writer writer = OrcFile.createWriter(new Path("/my_file_path"),
                   OrcFile.writerOptions(conf).inspector(inspector).stripeSize(my_stripe_size).bufferSize(my_buffer_size)
                       .version(OrcFile.Version.V_0_12));
               /** code to prepare tslist **/

               for (Timestamp ts : tslist) {
                 writer.addRow(ts);
               }

               writer.close();

I got following error:

org.apache.hadoop.ipc.RemoteException(java.io.IOException): Specified block size is less than
configured minimum value (dfs.namenode.fs-limits.min-block-size): 200000 < 1048576
               at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2215)
               at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2180)
               at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:505)
               at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354)
               at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
               at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
               at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)

After debugging into the code, I found this line in WriterImpl.java(line 169):

    this.blockSize = Math.min(MAX_BLOCK_SIZE, 2 * stripeSize);

Basically, the block size is set as stripeSize times 2.  When there is no guarantee stripeSize
is at least half of the default minimal block size (i.e. 1048576 as defined by dfs.namenode.fs-limits.min-block-size),
such exception will be inevitable.

Do we need to change code to at least set blockSize to what is set by dfs.namenode.fs-limits.min-block-size?
 It will be nice to have a separate option for blockSize (instead of always the twice of stripeSize)
although.

Thanks

John

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message