predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: [PredictionIO Error] Running Hbase
Date Mon, 13 Mar 2017 17:25:52 GMT
We will also release a Template that trims, compacts and optionally de-duplicates the DB using
the SelfCleaningDataSource. As a template you can schedule it separately from `pio train`.
 The SelfCleaningDataSource method is pretty slow so we run it on some clients daily to maintain
a moving time window of data.

Here is the template, we’ll put it in the PIO Gallery after release. https://github.com/actionml/db-cleaner
<https://github.com/actionml/db-cleaner>


On Mar 12, 2017, at 5:16 PM, Donald Szeto <donald@apache.org> wrote:

Hi Amy,

Since event server keeps adding events to the backend, the storage will grow indefinitely
unless you implement some sort of data retention policy that periodically.

In 0.11, there are two options for this situation:
- You may use SelfCleaningDataSource. Backing up your existing data is highly recommended
before you try it.
- If your use case allows you to overwrite events (https://github.com/apache/incubator-predictionio/pull/356
<https://github.com/apache/incubator-predictionio/pull/356>), you may overwrite them
instead of keep adding to it.

Your experience would be very helpful to others as well. Would you like to contribute how
you fix your problem to the FAQ?

https://github.com/apache/incubator-predictionio/blob/livedoc/docs/manual/source/resources/faq.html.md
<https://github.com/apache/incubator-predictionio/blob/livedoc/docs/manual/source/resources/faq.html.md>

Regards,
Donald

On Fri, Mar 10, 2017 at 11:32 PM, Lin Amy <linamy85@gmail.com <mailto:linamy85@gmail.com>>
wrote:
Hello everyone,

Mission completed!

The issue is solved after I fix the following error from `hbase hbck` :
ERROR: Region { meta => pio_event:events_1,,1488109005690.f2fe88521bdf946650842f74bb4c978d.,
hdfs => file:/home/crs/hbase/hbase/data/pio_event/events_1/f2fe88521bdf946650842f74bb4c978d,
deployed =>  } not deployed on any region server.
ERROR: (region pio_event:events_1,\x80#X,1489209095682.97a91816f25aa71ce2e2a0342776ddbe.)
First region should start with an empty key.  You need to  create a new region and regioninfo
in HDFS to plug the hole.

`hbase hbck -repair` & `hbase hbck -repairHoles` doesn't solve the problem at all... 

But after trying these:
1. stoping HBase
2. delete recovered.edits folders for failing regions.
3. hbase hbck  -repairHoles
(ref: https://serverfault.com/questions/510290/hbase-hbck-cant-fix-region-inconsistencies
<https://serverfault.com/questions/510290/hbase-hbck-cant-fix-region-inconsistencies>)

Problem solved!!!
Hope it can saves others time when this occurs again (hopefully not... Orz)

Best regards,
Amy


Lin Amy <linamy85@gmail.com <mailto:linamy85@gmail.com>> 於 2017年3月11日
週六 下午2:41寫道:
Hello again,

I have solved the problem with reference here: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
<https://issues.apache.org/jira/browse/ZOOKEEPER-1621>, and `pio status` returns me
with a normal result, which seems great. 
However, the problem now is that I receive 500 (internal server error) with message that "The
server was not able to produce a timely response to your request.". 
Also, when I do `pio train`, it fails with the following message:
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35,
exceptions:
Sat Mar 11 14:00:10 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException:
Connection refused
Sat Mar 11 14:00:10 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This server is in the failed servers list: PredictIO3.ucf.com/10.1.3.153:37708 <http://predictio3.ucf.com/10.1.3.153:37708>
Sat Mar 11 14:00:11 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This server is in the failed servers list: PredictIO3.ucf.com/10.1.3.153:37708 <http://predictio3.ucf.com/10.1.3.153:37708>
Sat Mar 11 14:00:12 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This server is in the failed servers list: PredictIO3.ucf.com/10.1.3.153:37708 <http://predictio3.ucf.com/10.1.3.153:37708>
Sat Mar 11 14:00:14 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException:
Connection refused
Sat Mar 11 14:00:18 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException:
Connection refused
Sat Mar 11 14:00:28 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException:
Connection refused
Sat Mar 11 14:00:38 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException:
Connection refused
Sat Mar 11 14:00:48 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException:
Connection refused
Sat Mar 11 14:00:58 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException:
Connection refused
Sat Mar 11 14:01:18 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException:
Connection refused
Sat Mar 11 14:01:38 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException:
Connection refused
Sat Mar 11 14:01:58 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException:
Connection refused
Sat Mar 11 14:02:18 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException:
Connection refused
Sat Mar 11 14:02:39 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException:
Connection refused
Sat Mar 11 14:02:59 CST 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException:
Connection refused

I have tried to delete everything inside /hbase/zookeeper by some online advise, but the issue
remained.

Have someone met this failure and solved it?
Thank you and appreciate for any help!

Best regards,
Amy

Lin Amy <linamy85@gmail.com <mailto:linamy85@gmail.com>> 於 2017年3月11日
週六 上午10:28寫道:
Hello,

Yesterday I found the disk is fulled, which lead to Hbase failure:

stopping hbase/home/crs/PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/bin/stop-hbase.sh:
line 50: echo: write error: No space left on device
Java HotSpot(TM) 64-Bit Server VM warning: Insufficient space for shared memory file:
   853
Try using the -Djava.io.tmpdir= option to select an alternate temp location.

So I spare a lot of disk spaces, and tried to `pio-stop-all` and `pio-start-all`. Then `pio
status` gave me error:
-----------------------------------------------------
[INFO] [Console$] Inspecting PredictionIO...
[INFO] [Console$] PredictionIO 0.10.0-incubating is installed at /home/crs/PredictionIO-0.10.0-incubating
[INFO] [Console$] Inspecting Apache Spark...
[INFO] [Console$] Apache Spark is installed at /home/crs/PredictionIO-0.10.0-incubating/vendors/spark-1.6.2-bin-hadoop2.6
[INFO] [Console$] Apache Spark 1.6.2 detected (meets minimum requirement of 1.3.0)
[INFO] [Console$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: LOCALFS)...
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[ERROR] [RecoverableZooKeeper] ZooKeeper exists failed after 1 attempts
[ERROR] [ZooKeeperWatcher] hconnection-0x3fc05ea2, quorum=localhost:2181, baseZNode=/hbase
Received unexpected KeeperException, re-throwing exception
[WARN] [ZooKeeperRegistry] Can't retrieve clusterId from Zookeeper
[ERROR] [StorageClient] Cannot connect to ZooKeeper (ZooKeeper ensemble: localhost). Please
make sure that the configuration is pointing at the correct ZooKeeper ensemble. By default,
HBase manages its own ZooKeeper, so if you have not configured HBase to use an external ZooKeeper,
that means your HBase is not started or configured properly.
[ERROR] [Storage$] Error initializing storage client for source HBASE
[ERROR] [Console$] Unable to connect to all storage backends successfully. The following shows
the error message from the storage backend.
[ERROR] [Console$] Data source HBASE was not properly initialized. (org.apache.predictionio.data.storage.StorageClientException)
[ERROR] [Console$] Dumping configuration of initialized storage backend sources. Please make
sure they are correct.
[ERROR] [Console$] Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOME ->
/home/crs/PredictionIO-0.10.0-incubating/vendors/elasticsearch-1.7.5, HOSTS -> Slave2,PredictIO3,
PORTS -> 9300,9320, CLUSTERNAME -> CRS, TYPE -> elasticsearch
[ERROR] [Console$] Source Name: LOCALFS; Type: localfs; Configuration: PATH -> /home/crs/.pio_store/models,
TYPE -> localfs
[ERROR] [Console$] Source Name: HBASE; Type: (error); Configuration: (error)

------------------------------------------------------
My guess is that it fails whenever it tried to restart zookeeper.

My pio-env.sh & some error in `hbase-crs-master-PredictIO3.log` is also attached. 

Thank you!!!!

Best regards,
Amy



Mime
View raw message