predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donald Szeto <don...@apache.org>
Subject Re: [PredictionIO Error] Running Hbase
Date Mon, 13 Mar 2017 00:16:22 GMT
Hi Amy,

Since event server keeps adding events to the backend, the storage will
grow indefinitely unless you implement some sort of data retention policy
that periodically.

In 0.11, there are two options for this situation:
- You may use SelfCleaningDataSource. Backing up your existing data is
highly recommended before you try it.
- If your use case allows you to overwrite events (
https://github.com/apache/incubator-predictionio/pull/356), you may
overwrite them instead of keep adding to it.

Your experience would be very helpful to others as well. Would you like to
contribute how you fix your problem to the FAQ?

https://github.com/apache/incubator-predictionio/blob/livedoc/docs/manual/source/resources/faq.html.md

Regards,
Donald

On Fri, Mar 10, 2017 at 11:32 PM, Lin Amy <linamy85@gmail.com> wrote:

> Hello everyone,
>
> Mission completed!
>
> The issue is solved after I fix the following error from `hbase hbck` :
> ERROR: Region { meta => pio_event:events_1,,1488109005690.
> f2fe88521bdf946650842f74bb4c978d., hdfs => file:/home/crs/hbase/hbase/
> data/pio_event/events_1/f2fe88521bdf946650842f74bb4c978d, deployed =>  }
> not deployed on any region server.
> ERROR: (region pio_event:events_1,\x80#X,1489209095682.
> 97a91816f25aa71ce2e2a0342776ddbe.) First region should start with an
> empty key.  You need to  create a new region and regioninfo in HDFS to plug
> the hole.
>
> `hbase hbck -repair` & `hbase hbck -repairHoles` doesn't solve the problem
> at all...
>
> But after trying these:
> 1. stoping HBase
> 2. delete recovered.edits folders for failing regions.
> 3. hbase hbck  -repairHoles
> (ref: https://serverfault.com/questions/510290/hbase-hbck-cant-fix-region-
> inconsistencies)
>
> Problem solved!!!
> Hope it can saves others time when this occurs again (hopefully not... Orz)
>
> Best regards,
> Amy
>
>
> Lin Amy <linamy85@gmail.com> 於 2017年3月11日 週六 下午2:41寫道:
>
>> Hello again,
>>
>> I have solved the problem with reference here: https://issues.apache.
>> org/jira/browse/ZOOKEEPER-1621, and `pio status` returns me with a
>> normal result, which seems great.
>> However, the problem now is that I receive 500 (internal server error)
>> with message that "The server was not able to produce a timely response
>> to your request.".
>> Also, when I do `pio train`, it fails with the following message:
>> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> Failed after attempts=35, exceptions: Sat Mar 11 14:00:10 CST 2017,
>> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
>> java.net.ConnectException: Connection refused Sat Mar 11 14:00:10 CST 2017,
>> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
>> org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server
>> is in the failed servers list: PredictIO3.ucf.com/10.1.3.153:37708 Sat
>> Mar 11 14:00:11 CST 2017, org.apache.hadoop.hbase.
>> client.RpcRetryingCaller@7dfeb08d, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
>> This server is in the failed servers list: PredictIO3.ucf.com/10.1.3.153:
>> 37708 Sat Mar 11 14:00:12 CST 2017, org.apache.hadoop.hbase.
>> client.RpcRetryingCaller@7dfeb08d, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
>> This server is in the failed servers list: PredictIO3.ucf.com/10.1.3.153:
>> 37708 Sat Mar 11 14:00:14 CST 2017, org.apache.hadoop.hbase.
>> client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException: Connection
>> refused Sat Mar 11 14:00:18 CST 2017, org.apache.hadoop.hbase.
>> client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException: Connection
>> refused Sat Mar 11 14:00:28 CST 2017, org.apache.hadoop.hbase.
>> client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException: Connection
>> refused Sat Mar 11 14:00:38 CST 2017, org.apache.hadoop.hbase.
>> client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException: Connection
>> refused Sat Mar 11 14:00:48 CST 2017, org.apache.hadoop.hbase.
>> client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException: Connection
>> refused Sat Mar 11 14:00:58 CST 2017, org.apache.hadoop.hbase.
>> client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException: Connection
>> refused Sat Mar 11 14:01:18 CST 2017, org.apache.hadoop.hbase.
>> client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException: Connection
>> refused Sat Mar 11 14:01:38 CST 2017, org.apache.hadoop.hbase.
>> client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException: Connection
>> refused Sat Mar 11 14:01:58 CST 2017, org.apache.hadoop.hbase.
>> client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException: Connection
>> refused Sat Mar 11 14:02:18 CST 2017, org.apache.hadoop.hbase.
>> client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException: Connection
>> refused Sat Mar 11 14:02:39 CST 2017, org.apache.hadoop.hbase.
>> client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException: Connection
>> refused Sat Mar 11 14:02:59 CST 2017, org.apache.hadoop.hbase.
>> client.RpcRetryingCaller@7dfeb08d, java.net.ConnectException: Connection
>> refused
>>
>> I have tried to delete everything inside /hbase/zookeeper by some online
>> advise, but the issue remained.
>>
>> Have someone met this failure and solved it?
>> Thank you and appreciate for any help!
>>
>> Best regards,
>> Amy
>>
>> Lin Amy <linamy85@gmail.com> 於 2017年3月11日 週六 上午10:28寫道:
>>
>> Hello,
>>
>> Yesterday I found the disk is fulled, which lead to Hbase failure:
>>
>> *stopping
>> hbase/home/crs/PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/bin/stop-hbase.sh:
>> line 50: echo: write error: No space left on device*
>> *Java HotSpot(TM) 64-Bit Server VM warning: Insufficient space for shared
>> memory file:*
>> *   853*
>> *Try using the -Djava.io.tmpdir= option to select an alternate temp
>> location.*
>>
>> So I spare a lot of disk spaces, and tried to `pio-stop-all` and
>> `pio-start-all`. Then `pio status` gave me error:
>> -----------------------------------------------------
>> *[INFO] [Console$] Inspecting PredictionIO...*
>> *[INFO] [Console$] PredictionIO 0.10.0-incubating is installed at
>> /home/crs/PredictionIO-0.10.0-incubating*
>> *[INFO] [Console$] Inspecting Apache Spark...*
>> *[INFO] [Console$] Apache Spark is installed at
>> /home/crs/PredictionIO-0.10.0-incubating/vendors/spark-1.6.2-bin-hadoop2.6*
>> *[INFO] [Console$] Apache Spark 1.6.2 detected (meets minimum requirement
>> of 1.3.0)*
>> *[INFO] [Console$] Inspecting storage backend connections...*
>> *[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...*
>> *[INFO] [Storage$] Verifying Model Data Backend (Source: LOCALFS)...*
>> *[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...*
>> *[ERROR] [RecoverableZooKeeper] ZooKeeper exists failed after 1 attempts*
>> *[ERROR] [ZooKeeperWatcher] hconnection-0x3fc05ea2,
>> quorum=localhost:2181, baseZNode=/hbase Received unexpected
>> KeeperException, re-throwing exception*
>> *[WARN] [ZooKeeperRegistry] Can't retrieve clusterId from Zookeeper*
>> *[ERROR] [StorageClient] Cannot connect to ZooKeeper (ZooKeeper ensemble:
>> localhost). Please make sure that the configuration is pointing at the
>> correct ZooKeeper ensemble. By default, HBase manages its own ZooKeeper, so
>> if you have not configured HBase to use an external ZooKeeper, that means
>> your HBase is not started or configured properly.*
>> *[ERROR] [Storage$] Error initializing storage client for source HBASE*
>> *[ERROR] [Console$] Unable to connect to all storage backends
>> successfully. The following shows the error message from the storage
>> backend.*
>> *[ERROR] [Console$] Data source HBASE was not properly initialized.
>> (org.apache.predictionio.data.storage.StorageClientException)*
>> *[ERROR] [Console$] Dumping configuration of initialized storage backend
>> sources. Please make sure they are correct.*
>> *[ERROR] [Console$] Source Name: ELASTICSEARCH; Type: elasticsearch;
>> Configuration: HOME ->
>> /home/crs/PredictionIO-0.10.0-incubating/vendors/elasticsearch-1.7.5, HOSTS
>> -> Slave2,PredictIO3, PORTS -> 9300,9320, CLUSTERNAME -> CRS, TYPE ->
>> elasticsearch*
>> *[ERROR] [Console$] Source Name: LOCALFS; Type: localfs; Configuration:
>> PATH -> /home/crs/.pio_store/models, TYPE -> localfs*
>> *[ERROR] [Console$] Source Name: HBASE; Type: (error); Configuration:
>> (error)*
>>
>> ------------------------------------------------------
>> My guess is that it fails whenever it tried to restart zookeeper.
>>
>> My pio-env.sh & some error in `hbase-crs-master-PredictIO3.log` is also
>> attached.
>>
>> Thank you!!!!
>>
>> Best regards,
>> Amy
>>
>>

Mime
View raw message