hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Estes <james.es...@gmail.com>
Subject Re: RegionServers shutdown randomly
Date Sat, 08 Aug 2015 02:59:32 GMT
There is this
http://mail-archives.apache.org/mod_mbox/hbase-user/201507.mbox/%3CCAE8tVdmyUfG%2BajK0gvMG_tLjoStZ0HjrQxJuuJzQ3Z%2B4vbzSuQ%40mail.gmail.com%3E
Which points to
https://issues.apache.org/jira/browse/HDFS-8809

But (at least for us) this hasn't lead to region server
crashing...though I'm definitely interested in what issues it may be
able to cause.

James


On Fri, Aug 7, 2015 at 11:05 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> Some WAL related files were marked corrupt.
>
> Can you try repairing them ?
>
> Please check namenode log.
> Search HDFS JIRA for any pending fix - I haven't tracked HDFS movement
> closely recently.
>
> Thanks
>
> On Fri, Aug 7, 2015 at 7:54 AM, Adrià Vilà <avila@datknosys.com> wrote:
>
>> About the logs attached in this conversation: only w-0 and w-1 nodes had
>> failed, first w-0 and then w-1
>> 10.240.187.182 = w-2
>> w-0 internal IP address is 10.240.164.0
>> w-1 IP is 10.240.2.235
>> m IP is 10.240.200.196
>>
>> FSCK (hadoop fsck / | egrep -v '^\.+$' | grep -v eplica) output:
>> -
>> Connecting to namenode via
>> http://hdp-m.c.dks-hadoop.internal:50070/fsck?ugi=root&path=%2F FSCK
>> started by root (auth:SIMPLE) from /10.240.200.196 for path / at Fri Aug
>> 07 14:51:22 UTC 2015
>> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438946915810-splitting/hdp-w-0.c.dks-hadoop.internal%2C1602
>> 0%2C1438946915810..meta.1438950914376.meta: MISSING 1 blocks of total size
>> 90 B......
>> /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438959061234/hdp-w-1.c.dks-hadoop.internal%2C16020%2C143895
>> 9061234.default.1438959069800: MISSING 1 blocks of total size 90 B...
>> /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438959056208/hdp-w-2.c.dks-hadoop.internal%2C16020%2C143895
>> 9056208..meta.1438959068352.meta: MISSING 1 blocks of total size 90 B.
>> /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438959056208/hdp-w-2.c.dks-hadoop.internal%2C16020%2C143895
>> 9056208.default.1438959061922: MISSING 1 blocks of total size 90
>> B...........................
>>
>> .........Status: CORRUPT
>> Total size: 54919712019 B (Total open files size: 360 B)
>> Total dirs: 1709 Total files: 2628
>> Total symlinks: 0 (Files currently being written: 6)
>> Total blocks (validated): 2692 (avg. block size 20401081 B) (Total open
>> file blocks (not validated): 4)
>> ********************************
>> UNDER MIN REPL'D BLOCKS: 4 (0.1485884 %)
>> CORRUPT FILES: 4
>> MISSING BLOCKS: 4
>> MISSING SIZE: 360 B
>> ********************************
>> Corrupt blocks: 0
>> Number of data-nodes: 4
>> Number of racks: 1
>> FSCK ended at Fri Aug 07 14:51:26 UTC 2015 in 4511 milliseconds
>>
>> The filesystem under path '/' is CORRUPT
>> -
>>
>> Thank you for your time.
>>
>> *Desde*: "Ted Yu" <yuzhihong@gmail.com>
>> *Enviado*: viernes, 07 de agosto de 2015 16:07
>> *Para*: "user@hbase.apache.org" <user@hbase.apache.org>,
>> avila@datknosys.com
>> *Asunto*: Re: RegionServers shutdown randomly
>>
>> Does 10.240.187.182 <http://10.240.187.182:50010/> correspond with w-0 or
>> m ?
>>
>> Looks like hdfs was intermittently unstable.
>> Have you run fsck ?
>>
>> Cheers
>>
>> On Fri, Aug 7, 2015 at 12:59 AM, Adrià Vilà <avila@datknosys.com> wrote:
>>>
>>>  Hello,
>>>
>>>  HBase RegionServers fail once in a while:
>>>   - it can be any regionserver, not always de same  - it can happen when
>>> all the cluster is idle (at least not executing any human launched task)
>>>  - it can happen at any time, not always the same
>>>
>>>  The cluster versions:
>>>   - Phoenix 4.4 (or 4.5)  - HBase 1.1.1  - Hadoop/HDFS 2.7.1  - Zookeeper
>>> 3.4.6     Some configs:
>>>  -  ulimit -a
>>>  core file size          (blocks, -c) 0
>>> data seg size           (kbytes, -d) unlimited
>>> scheduling priority             (-e) 0
>>> file size               (blocks, -f) unlimited
>>> pending signals                 (-i) 103227
>>> max locked memory       (kbytes, -l) 64
>>> max memory size         (kbytes, -m) unlimited
>>> open files                      (-n) 1024
>>> pipe size            (512 bytes, -p) 8
>>> POSIX message queues     (bytes, -q) 819200
>>> real-time priority              (-r) 0
>>> stack size              (kbytes, -s) 10240
>>> cpu time               (seconds, -t) unlimited
>>> max user processes              (-u) 103227
>>> virtual memory          (kbytes, -v) unlimited
>>> file locks                      (-x) unlimited
>>>  - have increased default timeouts for: hbase rpc, zookeeper session, dks
>>> socket, regionserver lease and client scanner.
>>>
>>>  Next you can find the logs for the master, the regionserver that failed
>>> first, another failed and the datanode log for master and worker.
>>>
>>>
>>>  The timing was aproximately:
>>>     14:05 start hbase
>>>  14.11 w-0 down
>>>  14.14 w-1 down
>>>  14.15 stop hbase
>>>
>>>
>>>   -------------
>>>  hbase master log (m)
>>>  -------------
>>>  2015-08-06 14:11:13,640 ERROR
>>> [PriorityRpcServer.handler=19,queue=1,port=16000] master.MasterRpcServices:
>>> Region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 reported a
>>> fatal error:
>>>  ABORTING region server
>>> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable exception
>>> while closing region
>>> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
>>> still finishing close
>>>  Cause:
>>>  java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>
>>>  --------------
>>>  hbase regionserver log (w-0)
>>>  --------------
>>>  2015-08-06 14:11:13,611 INFO
>>> [PriorityRpcServer.handler=0,queue=0,port=16020]
>>> regionserver.RSRpcServices: Close 888f017eb1c0557fbe7079b50626c891, moving
>>> to hdp-m.c.dks-hadoop.internal,16020,1438869954062
>>>  2015-08-06 14:11:13,615 INFO
>>> [StoreCloserThread-SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.-1]
>>> regionserver.HStore: Closed 0
>>>  2015-08-06 14:11:13,616 FATAL
>>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.append-pool1-t1]
>>> wal.FSHLog: Could not append. Requesting close of wal
>>>  java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>  2015-08-06 14:11:13,617 ERROR [sync.4] wal.FSHLog: Error syncing,
>>> request close of wal
>>>  java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>  2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0]
>>> regionserver.HRegionServer: ABORTING region server
>>> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable exception
>>> while closing region
>>> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
>>> still finishing close
>>>  java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>  2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0]
>>> regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
>>> [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl,
>>> org.apache.hadoop.hbase.regionserver.LocalIndexSplitter,
>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver,
>>> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver,
>>> org.apache.phoenix.coprocessor.ScanRegionObserver,
>>> org.apache.phoenix.hbase.index.Indexer,
>>> org.apache.phoenix.coprocessor.SequenceRegionObserver,
>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl]
>>>  2015-08-06 14:11:13,627 INFO  [RS_CLOSE_REGION-hdp-w-0:16020-0]
>>> regionserver.HRegionServer: Dump of metrics as JSON on abort: {
>>>    "beans" : [ {
>>>      "name" : "java.lang:type=Memory",
>>>      "modelerType" : "sun.management.MemoryImpl",
>>>      "Verbose" : true,
>>>      "HeapMemoryUsage" : {
>>>        "committed" : 2104754176,
>>>        "init" : 2147483648,
>>>        "max" : 2104754176,
>>>        "used" : 262288688
>>>      },
>>>      "ObjectPendingFinalizationCount" : 0,
>>>      "NonHeapMemoryUsage" : {
>>>        "committed" : 137035776,
>>>        "init" : 136773632,
>>>        "max" : 184549376,
>>>        "used" : 49168288
>>>      },
>>>      "ObjectName" : "java.lang:type=Memory"
>>>    } ],
>>>    "beans" : [ {
>>>      "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC",
>>>      "modelerType" : "RegionServer,sub=IPC",
>>>      "tag.Context" : "regionserver",
>>>      "tag.Hostname" : "hdp-w-0"
>>>    } ],
>>>    "beans" : [ {
>>>      "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
>>>      "modelerType" : "RegionServer,sub=Replication",
>>>      "tag.Context" : "regionserver",
>>>      "tag.Hostname" : "hdp-w-0"
>>>    } ],
>>>    "beans" : [ {
>>>      "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server",
>>>      "modelerType" : "RegionServer,sub=Server",
>>>      "tag.Context" : "regionserver",
>>>      "tag.Hostname" : "hdp-w-0"
>>>    } ]
>>>  }
>>>  2015-08-06 14:11:13,640 ERROR [sync.0] wal.FSHLog: Error syncing,
>>> request close of wal
>>>  java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>  2015-08-06 14:11:13,640 WARN
>>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>>> wal.FSHLog: Failed last sync but no outstanding unsync edits so falling
>>> through to close; java.io.IOException: All datanodes
>>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>> are bad. Aborting...
>>>  2015-08-06 14:11:13,641 ERROR
>>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>>> wal.ProtobufLogWriter: Got IOException while writing trailer
>>>  java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>  2015-08-06 14:11:13,641 WARN
>>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>>> wal.FSHLog: Riding over failed WAL close of
>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576,
>>> cause="All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>> are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS
>>> SYNCED SO SHOULD BE OK
>>>  2015-08-06 14:11:13,642 INFO
>>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>>> wal.FSHLog: Rolled WAL
>>> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
>>> with entries=101, filesize=30.38 KB; new WAL
>>> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
>>>  2015-08-06 14:11:13,643 INFO  [RS_CLOSE_REGION-hdp-w-0:16020-0]
>>> regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing
>>> region
>>> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
>>> still finishing close
>>>  2015-08-06 14:11:13,643 INFO
>>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>>> wal.FSHLog: Archiving
>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
>>> to
>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
>>>  2015-08-06 14:11:13,643 ERROR [RS_CLOSE_REGION-hdp-w-0:16020-0]
>>> executor.EventHandler: Caught throwable while processing event
>>> M_RS_CLOSE_REGION
>>>  java.lang.RuntimeException: java.io.IOException: All datanodes
>>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>> are bad. Aborting...
>>>          at
>>> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152)
>>>          at
>>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>>>          at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>          at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>          at java.lang.Thread.run(Thread.java:745)
>>>  Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>
>>>  ------------
>>>  hbase regionserver log (w-1)
>>>  ------------
>>>  2015-08-06 14:11:14,267 INFO  [main-EventThread]
>>> replication.ReplicationTrackerZKImpl:
>>> /hbase-unsecure/rs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 znode
>>> expired, triggering replicatorRemoved event
>>>  2015-08-06 14:12:08,203 INFO  [ReplicationExecutor-0]
>>> replication.ReplicationQueuesZKImpl: Atomically moving
>>> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905's wals to my queue
>>>  2015-08-06 14:12:56,252 INFO
>>> [PriorityRpcServer.handler=5,queue=1,port=16020]
>>> regionserver.RSRpcServices: Close 918ed7c6568e7500fb434f4268c5bbc5, moving
>>> to hdp-m.c.dks-hadoop.internal,16020,1438869954062
>>>  2015-08-06 14:12:56,260 INFO
>>> [StoreCloserThread-SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.-1]
>>> regionserver.HStore: Closed 0
>>>  2015-08-06 14:12:56,261 FATAL
>>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.append-pool1-t1]
>>> wal.FSHLog: Could not append. Requesting close of wal
>>>  java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>  2015-08-06 14:12:56,261 ERROR [sync.3] wal.FSHLog: Error syncing,
>>> request close of wal
>>>  java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>  2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0]
>>> regionserver.HRegionServer: ABORTING region server
>>> hdp-w-1.c.dks-hadoop.internal,16020,1438869946909: Unrecoverable exception
>>> while closing region
>>> SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.,
>>> still finishing close
>>>  java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>  2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0]
>>> regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
>>> [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl,
>>> org.apache.hadoop.hbase.regionserver.LocalIndexSplitter,
>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver,
>>> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver,
>>> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint,
>>> org.apache.phoenix.coprocessor.ScanRegionObserver,
>>> org.apache.phoenix.hbase.index.Indexer,
>>> org.apache.phoenix.coprocessor.SequenceRegionObserver]
>>>  2015-08-06 14:12:56,281 INFO  [RS_CLOSE_REGION-hdp-w-1:16020-0]
>>> regionserver.HRegionServer: Dump of metrics as JSON on abort: {
>>>    "beans" : [ {
>>>      "name" : "java.lang:type=Memory",
>>>      "modelerType" : "sun.management.MemoryImpl",
>>>      "ObjectPendingFinalizationCount" : 0,
>>>      "NonHeapMemoryUsage" : {
>>>        "committed" : 137166848,
>>>        "init" : 136773632,
>>>        "max" : 184549376,
>>>        "used" : 48667528
>>>      },
>>>      "HeapMemoryUsage" : {
>>>        "committed" : 2104754176,
>>>        "init" : 2147483648,
>>>        "max" : 2104754176,
>>>        "used" : 270075472
>>>      },
>>>      "Verbose" : true,
>>>      "ObjectName" : "java.lang:type=Memory"
>>>    } ],
>>>    "beans" : [ {
>>>      "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC",
>>>      "modelerType" : "RegionServer,sub=IPC",
>>>      "tag.Context" : "regionserver",
>>>      "tag.Hostname" : "hdp-w-1"
>>>    } ],
>>>    "beans" : [ {
>>>      "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
>>>      "modelerType" : "RegionServer,sub=Replication",
>>>      "tag.Context" : "regionserver",
>>>      "tag.Hostname" : "hdp-w-1"
>>>    } ],
>>>    "beans" : [ {
>>>      "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server",
>>>      "modelerType" : "RegionServer,sub=Server",
>>>      "tag.Context" : "regionserver",
>>>      "tag.Hostname" : "hdp-w-1"
>>>    } ]
>>>  }
>>>  2015-08-06 14:12:56,284 ERROR [sync.4] wal.FSHLog: Error syncing,
>>> request close of wal
>>>  java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>  2015-08-06 14:12:56,285 WARN
>>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>>> wal.FSHLog: Failed last sync but no outstanding unsync edits so falling
>>> through to close; java.io.IOException: All datanodes
>>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>> are bad. Aborting...
>>>  2015-08-06 14:12:56,285 ERROR
>>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>>> wal.ProtobufLogWriter: Got IOException while writing trailer
>>>  java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>  2015-08-06 14:12:56,285 WARN
>>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>>> wal.FSHLog: Riding over failed WAL close of
>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359,
>>> cause="All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>> are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS
>>> SYNCED SO SHOULD BE OK
>>>  2015-08-06 14:12:56,287 INFO
>>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>>> wal.FSHLog: Rolled WAL
>>> /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
>>> with entries=100, filesize=30.73 KB; new WAL
>>> /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438870376262
>>>  2015-08-06 14:12:56,288 INFO
>>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>>> wal.FSHLog: Archiving
>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
>>> to
>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
>>>  2015-08-06 14:12:56,315 INFO  [RS_CLOSE_REGION-hdp-w-1:16020-0]
>>> regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing
>>> region
>>> SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.,
>>> still finishing close
>>>  2015-08-06 14:12:56,315 INFO
>>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020]
>>> regionserver.SplitLogWorker: Sending interrupt to stop the worker thread
>>>  2015-08-06 14:12:56,315 ERROR [RS_CLOSE_REGION-hdp-w-1:16020-0]
>>> executor.EventHandler: Caught throwable while processing event
>>> M_RS_CLOSE_REGION
>>>  java.lang.RuntimeException: java.io.IOException: All datanodes
>>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>> are bad. Aborting...
>>>          at
>>> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152)
>>>          at
>>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>>>          at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>          at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>          at java.lang.Thread.run(Thread.java:745)
>>>  Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>          at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>
>>>  -------------
>>>  m datanode log
>>>  -------------
>>>  2015-07-27 14:11:16,082 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073742677_1857,
>>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>>>  2015-07-27 14:11:16,132 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858 src: /
>>> 10.240.200.196:56767 dest: /10.240.200.196:50010
>>>  2015-07-27 14:11:16,155 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:56767,
>>> dest: /10.240.200.196:50010, bytes: 117761, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_177514816_1, offset: 0, srvID:
>>> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858, duration:
>>> 6385289
>>>  2015-07-27 14:11:16,155 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858,
>>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>>>  2015-07-27 14:11:16,267 ERROR datanode.DataNode
>>> (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver
>>> error processing unknown operation  src: /127.0.0.1:60513 dst: /
>>> 127.0.0.1:50010
>>>  java.io.EOFException
>>>          at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>>          at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>>          at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>>          at java.lang.Thread.run(Thread.java:745)
>>>  2015-07-27 14:11:16,405 INFO  datanode.DataNode
>>> (DataNode.java:transferBlock(1943)) - DatanodeRegistration(
>>> 10.240.200.196:50010, datanodeUuid=329bbe62-bcea-4a6d-8c97-e800631deb81,
>>> infoPort=50075, infoSecurePort=0, ipcPort=8010,
>>> storageInfo=lv=-56;cid=CID-1247f294-77a9-4605-b6d3-4c1398bb5db0;nsid=2032226938;c=0)
>>> Starting thread to transfer
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073742649_1829 to
>>> 10.240.2.235:50010 10.240.164.0:50010
>>>
>>>  -------------
>>>  w-0 datanode log
>>>  -------------
>>>  2015-07-27 14:11:25,019 ERROR datanode.DataNode
>>> (DataXceiver.java:run(278)) -
>>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>>> operation  src: /127.0.0.1:47993 dst: /127.0.0.1:50010
>>>  java.io.EOFException
>>>          at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>>          at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>>          at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>>          at java.lang.Thread.run(Thread.java:745)
>>>  2015-07-27 14:11:25,077 INFO  DataNode.clienttrace
>>> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
>>> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073742631, srvID:
>>> a5eea5a8-5112-46da-9f18-64274486c472, success: true
>>>
>>>
>>>  -----------------------------
>>>  Thank you in advance,
>>>
>>>  Adrià
>>>
>>>
>>
>>

Mime
View raw message