predictionio-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From don...@apache.org
Subject [1/5] incubator-predictionio git commit: Add solution for HBase failure after disk full
Date Thu, 30 Mar 2017 20:53:51 GMT
Repository: incubator-predictionio
Updated Branches:
  refs/heads/develop 23a869328 -> dfb01e327


Add solution for HBase failure after disk full

Due to some issues of ZooKeeper, it takes some effort to have HBase recovered from failure
caused by full disk.


Project: http://git-wip-us.apache.org/repos/asf/incubator-predictionio/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-predictionio/commit/6975fc06
Tree: http://git-wip-us.apache.org/repos/asf/incubator-predictionio/tree/6975fc06
Diff: http://git-wip-us.apache.org/repos/asf/incubator-predictionio/diff/6975fc06

Branch: refs/heads/develop
Commit: 6975fc06bad76ad275d10a17af80387c80e60fbd
Parents: 3525049
Author: Amy Lin <b03902055@ntu.edu.tw>
Authored: Mon Mar 13 09:40:33 2017 -0700
Committer: Donald Szeto <donald@apache.org>
Committed: Mon Mar 13 09:40:33 2017 -0700

----------------------------------------------------------------------
 docs/manual/source/resources/faq.html.md | 32 +++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-predictionio/blob/6975fc06/docs/manual/source/resources/faq.html.md
----------------------------------------------------------------------
diff --git a/docs/manual/source/resources/faq.html.md b/docs/manual/source/resources/faq.html.md
index f80167b..455d06c 100644
--- a/docs/manual/source/resources/faq.html.md
+++ b/docs/manual/source/resources/faq.html.md
@@ -216,3 +216,35 @@ there could be a chance that reverse DNS does not function properly.
You can
 install a DNS server on your own computer. Some users have reported that using
 [Google Public DNS](https://developers.google.com/speed/public-dns/) would also
 solve the problem.
+
+### Q: How to fix Hbase issues after disk recovered from full state?
+
+You may receive error messages like `write error: No space left on device` 
+when disk is full, and also receive error from `pio status` even after 
+restarting pio services (due to 
+[an issue](https://issues.apache.org/jira/browse/ZOOKEEPER-1621) in ZooKeeper).
+
+The workaround is to delete newest `snapshot.xxxxx` and `log.xxxoo` under 
+zookeeper data directory (ex: `$(HbaseRoot)/zookeeper/zookeeper_0/version-2`). Then 
+restart all service with `pio-start-all`, and `pio status` will give you good answer.
+
+But If you still have problems connecting to event server, go checkout Hbase 
+dashboard to see if there are `regions under transition`, then follow the steps: 
+
+1. Try `hbase hbck -repair` and `hbase hbck -repairHoles`. If it solves the 
+problem then you are all set, otherwise continue on.
+2. Find out the failing regions by `hbase hbck`.
+
+	```
+	  ...
+	Summary:
+	Table pio_event:events_1 is inconsistent.
+	    Number of regions: 2
+	    Deployed on:  prediction.io,54829,1489213832255
+	  ...
+	  2 inconsistencies detected.
+	```
+3. Shutdown Hbase process and delete `recovered.edits` folders under hbase data 
+directory (ex: `$(HbaseRoot)/hbase/data/pio_event/events_1` in this example) 
+for failing regions.
+4. Run `hbase hbck -repairHoles` and restart all pio services.


Mime
View raw message