predictionio-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [1/5] incubator-predictionio git commit: Add solution for HBase failure after disk full
Date Thu, 30 Mar 2017 20:53:51 GMT
Repository: incubator-predictionio
Updated Branches:
  refs/heads/develop 23a869328 -> dfb01e327

Add solution for HBase failure after disk full

Due to some issues of ZooKeeper, it takes some effort to have HBase recovered from failure
caused by full disk.


Branch: refs/heads/develop
Commit: 6975fc06bad76ad275d10a17af80387c80e60fbd
Parents: 3525049
Author: Amy Lin <>
Authored: Mon Mar 13 09:40:33 2017 -0700
Committer: Donald Szeto <>
Committed: Mon Mar 13 09:40:33 2017 -0700

 docs/manual/source/resources/ | 32 +++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
diff --git a/docs/manual/source/resources/ b/docs/manual/source/resources/
index f80167b..455d06c 100644
--- a/docs/manual/source/resources/
+++ b/docs/manual/source/resources/
@@ -216,3 +216,35 @@ there could be a chance that reverse DNS does not function properly.
You can
 install a DNS server on your own computer. Some users have reported that using
 [Google Public DNS]( would also
 solve the problem.
+### Q: How to fix Hbase issues after disk recovered from full state?
+You may receive error messages like `write error: No space left on device` 
+when disk is full, and also receive error from `pio status` even after 
+restarting pio services (due to 
+[an issue]( in ZooKeeper).
+The workaround is to delete newest `snapshot.xxxxx` and `log.xxxoo` under 
+zookeeper data directory (ex: `$(HbaseRoot)/zookeeper/zookeeper_0/version-2`). Then 
+restart all service with `pio-start-all`, and `pio status` will give you good answer.
+But If you still have problems connecting to event server, go checkout Hbase 
+dashboard to see if there are `regions under transition`, then follow the steps: 
+1. Try `hbase hbck -repair` and `hbase hbck -repairHoles`. If it solves the 
+problem then you are all set, otherwise continue on.
+2. Find out the failing regions by `hbase hbck`.
+	```
+	  ...
+	Summary:
+	Table pio_event:events_1 is inconsistent.
+	    Number of regions: 2
+	    Deployed on:,54829,1489213832255
+	  ...
+	  2 inconsistencies detected.
+	```
+3. Shutdown Hbase process and delete `recovered.edits` folders under hbase data 
+directory (ex: `$(HbaseRoot)/hbase/data/pio_event/events_1` in this example) 
+for failing regions.
+4. Run `hbase hbck -repairHoles` and restart all pio services.

View raw message