predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lin Amy <linam...@gmail.com>
Subject Re: [PredictionIO Error] Running Hbase
Date Mon, 13 Mar 2017 03:50:26 GMT
Hi Donald,

Thank you for advice! And of course I will contribute to FAQ later.

Best regards,
Amy

Donald Szeto <donald@apache.org> 於 2017年3月13日 週一 上午8:23寫道:

> Hi Amy,
>
> Since event server keeps adding events to the backend, the storage will
> grow indefinitely unless you implement some sort of data retention policy
> that periodically.
>
> In 0.11, there are two options for this situation:
> - You may use SelfCleaningDataSource. Backing up your existing data is
> highly recommended before you try it.
> - If your use case allows you to overwrite events (
> https://github.com/apache/incubator-predictionio/pull/356), you may
> overwrite them instead of keep adding to it.
>
> Your experience would be very helpful to others as well. Would you like to
> contribute how you fix your problem to the FAQ?
>
>
> https://github.com/apache/incubator-predictionio/blob/livedoc/docs/manual/source/resources/faq.html.md
>
> Regards,
> Donald
>
> On Fri, Mar 10, 2017 at 11:32 PM, Lin Amy <linamy85@gmail.com> wrote:
>
> Hello everyone,
>
> Mission completed!
>
> The issue is solved after I fix the following error from `hbase hbck` :
> ERROR: Region { meta =>
> pio_event:events_1,,1488109005690.f2fe88521bdf946650842f74bb4c978d., hdfs
> =>
> file:/home/crs/hbase/hbase/data/pio_event/events_1/f2fe88521bdf946650842f74bb4c978d,
> deployed =>  } not deployed on any region server.
> ERROR: (region
> pio_event:events_1,\x80#X,1489209095682.97a91816f25aa71ce2e2a0342776ddbe.)
> First region should start with an empty key.  You need to  create a new
> region and regioninfo in HDFS to plug the hole.
>
> `hbase hbck -repair` & `hbase hbck -repairHoles` doesn't solve the problem
> at all...
>
> But after trying these:
> 1. stoping HBase
> 2. delete recovered.edits folders for failing regions.
> 3. hbase hbck  -repairHoles
> (ref:
> https://serverfault.com/questions/510290/hbase-hbck-cant-fix-region-inconsistencies
> )
>
> Problem solved!!!
> Hope it can saves others time when this occurs again (hopefully not... Orz)
>
> Best regards,
> Amy
>
>
> Lin Amy <linamy85@gmail.com> 於 2017年3月11日 週六 下午2:41寫道:
>
> Hello again,
>
> I have solved the problem with reference here:
> https://issues.apache.org/jira/browse/ZOOKEEPER-1621, and `pio status`
> returns me with a normal result, which seems great.
> However, the problem now is that I receive 500 (internal server error)
> with message that "The server was not able to produce a timely response
> to your request.".
> Also, when I do `pio train`, it fails with the following message:
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Failed after attempts=35, exceptions: Sat Mar 11 14:00:10 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> java.net.ConnectException: Connection refused Sat Mar 11 14:00:10 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is
> in the failed servers list: PredictIO3.ucf.com/10.1.3.153:37708 Sat Mar
> 11 14:00:11 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is
> in the failed servers list: PredictIO3.ucf.com/10.1.3.153:37708 Sat Mar
> 11 14:00:12 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is
> in the failed servers list: PredictIO3.ucf.com/10.1.3.153:37708 Sat Mar
> 11 14:00:14 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> java.net.ConnectException: Connection refused Sat Mar 11 14:00:18 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> java.net.ConnectException: Connection refused Sat Mar 11 14:00:28 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> java.net.ConnectException: Connection refused Sat Mar 11 14:00:38 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> java.net.ConnectException: Connection refused Sat Mar 11 14:00:48 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> java.net.ConnectException: Connection refused Sat Mar 11 14:00:58 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> java.net.ConnectException: Connection refused Sat Mar 11 14:01:18 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> java.net.ConnectException: Connection refused Sat Mar 11 14:01:38 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> java.net.ConnectException: Connection refused Sat Mar 11 14:01:58 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> java.net.ConnectException: Connection refused Sat Mar 11 14:02:18 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> java.net.ConnectException: Connection refused Sat Mar 11 14:02:39 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> java.net.ConnectException: Connection refused Sat Mar 11 14:02:59 CST 2017,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d,
> java.net.ConnectException: Connection refused
>
> I have tried to delete everything inside /hbase/zookeeper by some online
> advise, but the issue remained.
>
> Have someone met this failure and solved it?
> Thank you and appreciate for any help!
>
> Best regards,
> Amy
>
> Lin Amy <linamy85@gmail.com> 於 2017年3月11日 週六 上午10:28寫道:
>
> Hello,
>
> Yesterday I found the disk is fulled, which lead to Hbase failure:
>
> *stopping
> hbase/home/crs/PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/bin/stop-hbase.sh:
> line 50: echo: write error: No space left on device*
> *Java HotSpot(TM) 64-Bit Server VM warning: Insufficient space for shared
> memory file:*
> *   853*
> *Try using the -Djava.io.tmpdir= option to select an alternate temp
> location.*
>
> So I spare a lot of disk spaces, and tried to `pio-stop-all` and
> `pio-start-all`. Then `pio status` gave me error:
> -----------------------------------------------------
> *[INFO] [Console$] Inspecting PredictionIO...*
> *[INFO] [Console$] PredictionIO 0.10.0-incubating is installed at
> /home/crs/PredictionIO-0.10.0-incubating*
> *[INFO] [Console$] Inspecting Apache Spark...*
> *[INFO] [Console$] Apache Spark is installed at
> /home/crs/PredictionIO-0.10.0-incubating/vendors/spark-1.6.2-bin-hadoop2.6*
> *[INFO] [Console$] Apache Spark 1.6.2 detected (meets minimum requirement
> of 1.3.0)*
> *[INFO] [Console$] Inspecting storage backend connections...*
> *[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...*
> *[INFO] [Storage$] Verifying Model Data Backend (Source: LOCALFS)...*
> *[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...*
> *[ERROR] [RecoverableZooKeeper] ZooKeeper exists failed after 1 attempts*
> *[ERROR] [ZooKeeperWatcher] hconnection-0x3fc05ea2, quorum=localhost:2181,
> baseZNode=/hbase Received unexpected KeeperException, re-throwing exception*
> *[WARN] [ZooKeeperRegistry] Can't retrieve clusterId from Zookeeper*
> *[ERROR] [StorageClient] Cannot connect to ZooKeeper (ZooKeeper ensemble:
> localhost). Please make sure that the configuration is pointing at the
> correct ZooKeeper ensemble. By default, HBase manages its own ZooKeeper, so
> if you have not configured HBase to use an external ZooKeeper, that means
> your HBase is not started or configured properly.*
> *[ERROR] [Storage$] Error initializing storage client for source HBASE*
> *[ERROR] [Console$] Unable to connect to all storage backends
> successfully. The following shows the error message from the storage
> backend.*
> *[ERROR] [Console$] Data source HBASE was not properly initialized.
> (org.apache.predictionio.data.storage.StorageClientException)*
> *[ERROR] [Console$] Dumping configuration of initialized storage backend
> sources. Please make sure they are correct.*
> *[ERROR] [Console$] Source Name: ELASTICSEARCH; Type: elasticsearch;
> Configuration: HOME ->
> /home/crs/PredictionIO-0.10.0-incubating/vendors/elasticsearch-1.7.5, HOSTS
> -> Slave2,PredictIO3, PORTS -> 9300,9320, CLUSTERNAME -> CRS, TYPE ->
> elasticsearch*
> *[ERROR] [Console$] Source Name: LOCALFS; Type: localfs; Configuration:
> PATH -> /home/crs/.pio_store/models, TYPE -> localfs*
> *[ERROR] [Console$] Source Name: HBASE; Type: (error); Configuration:
> (error)*
>
> ------------------------------------------------------
> My guess is that it fails whenever it tried to restart zookeeper.
>
> My pio-env.sh & some error in `hbase-crs-master-PredictIO3.log` is also
> attached.
>
> Thank you!!!!
>
> Best regards,
> Amy
>
>
>

Mime
View raw message