hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thai Bui <blquyt...@gmail.com>
Subject Re:
Date Wed, 13 Jun 2018 16:32:09 GMT
In which case, check your DataNode logs on one of the HDFS nodes. Also
check NameNode logs as well, the issue is related to HDFS not Hive so you
may have more luck debugging the problem there.

On Wed, Jun 13, 2018 at 11:16 AM Sowjanya Kakarala <sowjanya@agrible.com>
wrote:

> hmm, that is interesting. My df -h looks like below. I have all the logs
> and data in /mnt
>
> ~]$ df -h
>
> Filesystem      Size  Used Avail Use% Mounted on
>
> devtmpfs         16G   56K   16G   1% /dev
>
> tmpfs            16G     0   16G   0% /dev/shm
>
> /dev/nvme0n1p1  9.8G  6.1G  3.6G  63% /
>
> /dev/nvme1n1p1  5.0G  142M  4.9G   3% /emr
>
> /dev/nvme1n1p2  115G  2.2G  113G   2% /mnt
>
>
> On Wed, Jun 13, 2018 at 10:28 AM, Thai Bui <blquythai@gmail.com> wrote:
>
>> That error occurred usually because of disks nearly out of space. In your
>> EMR cluster, SSH into one of the nodes and do a `df -h` to check disk usage
>> in all of your EBS storages. HDFS is usually configured to be unhealthy
>> when disks it's writing to are >90% utilized. Once that happens, the
>> DataNode will just be taken out of the list of available nodes and in your
>> case, all the DataNode are not available, causing new blocks to be rejected
>> when the NameNode is requesting for a place to write to (0 available out of
>> 4 nodes).
>>
>> Even though your cluster said that there's 120Gb available, the available
>> space might not be where DataNode is configured to write to, thus the
>> misleading assumption that you still have available space. This also
>> happens when YARN and/or M/R logs are filling up the disks where the
>> DataNode is running.
>>
>> On Wed, Jun 13, 2018 at 8:56 AM Sowjanya Kakarala <sowjanya@agrible.com>
>> wrote:
>>
>>> Hi Sajid,
>>>
>>> As this is development environment, we have limited nodes (4datanodes
>>> 1masternode) on a unmanaged switch.
>>> So here each node will be treated as rack (managed by HDFS, which
>>> creates block copies) with one replica.
>>>
>>>
>>> On Wed, Jun 13, 2018 at 1:31 AM, Sajid Mohammed <sajid.hadoop@gmail.com>
>>> wrote:
>>>
>>>> what is your rack topology ?
>>>>
>>>> On Tue, Jun 12, 2018 at 9:26 PM Sowjanya Kakarala <sowjanya@agrible.com>
>>>> wrote:
>>>>
>>>>> Hi Guys,
>>>>>
>>>>>
>>>>> I have 4datanodes and one master node EMR cluster with 120GB data
>>>>> storage left. I have been running sqoop jobs which loads data to hive
>>>>> table. After some jobs ran successfully I suddenly see these errors all
>>>>> over the name node logs and datanodes logs.
>>>>>
>>>>> I have tried changing so many configurations as suggeted in
>>>>> stackoverflow and hortonworks sites but couldnt find a way for fixing
it.
>>>>>
>>>>>
>>>>> Here is the error:
>>>>>
>>>>> 2018-06-12 15:32:35,933 WARN [main]
>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>> /user/hive/warehouse/monolith.db/tblname/_SCRATCH0.28417629602676764/time_stamp=2018-04-02/_temporary/1/_temporary/attempt_1528318855054_3528_m_000000_1/part-m-00000
>>>>> could only be replicated to 0 nodes instead of minReplication (=1). 
There
>>>>> are 4 datanode(s) running and no node(s) are excluded in this operation.
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1735)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2561)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:829)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
>>>>>
>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>>>>>
>>>>>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847)
>>>>>
>>>>>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790)
>>>>>
>>>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>>>
>>>>>         at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>>>>>
>>>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2486)
>>>>>
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1489)
>>>>>
>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1435)
>>>>>
>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1345)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>>>>>
>>>>>         at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:444)
>>>>>
>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>
>>>>>         at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>
>>>>>         at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>
>>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
>>>>>
>>>>>         at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1838)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1638)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:704)
>>>>>
>>>>>
>>>>> References I already followed:
>>>>>
>>>>>
>>>>> https://community.hortonworks.com/articles/16144/write-or-append-failures-in-very-small-clusters-un.html
>>>>>
>>>>>
>>>>> https://stackoverflow.com/questions/14288453/writing-to-hdfs-from-java-getting-could-only-be-replicated-to-0-nodes-instead
>>>>>
>>>>> https://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>>>>
>>>>>
>>>>> https://stackoverflow.com/questions/36015864/hadoop-be-replicated-to-0-nodes-instead-of-minreplication-1-there-are-1/36310025
>>>>>
>>>>>
>>>>> Any help is appreciated.
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Sowjanya
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>
>> --
>> Thai
>>
>
>
>

-- 
Thai

Mime
View raw message