kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jxs <jxsk...@126.com>
Subject Re:Re: Re: Strange HBase rpc operation timeout error
Date Sun, 17 Dec 2017 10:32:24 GMT
Well, I found no other timeout or hbase rpc related errors other than these "JobFetcher/DefaultScheduler"
timeout errors.
And I am using HDFS for HBase storage, not S3, so I guess it's not related with the setting
"hbase.rpc.timeout": "3600000" specified in the doc.
Also, when the buidling job failed on a step, if I click "resume" on Kylin WebUI, it shows
the step has been done and goes to next step.


If the full log is needed, please let me know, I will post it.



在2017年12月17 17时21分, "Billy Liu"<billyliu@apache.org>写道:


Actually, in your questions, here are two HBase timeout. One is about the Cube build, the
other one is metadata access. 
For the first issue, please check this article: http://kylin.apache.org/docs21/install/kylin_aws_emr.html
 It introduces how to increase the HBase rpc timeout.
For the second issue, as previous discussion. We should keep it. 


2017-12-17 10:37 GMT+08:00 jxs <jxskiss@126.com>:

Hi Billy,
Thank you for pointing the previous discussion. But for now we are running a very small hbase
cluster for lower cost, which has only one slave node.
So the unsteady response time (in a range not two bad, eg: within 1 minute) is somehow acceptable.
The previous timeout error just interrupted the cube building procedure, we don't wan't that.
What is your suggestion for this use case?







在2017年12月16 11时48分, "Billy Liu"<billyliu@apache.org>写道:


Check this: http://apache-kylin.74782.x6.nabble.com/hbase-configed-with-fixed-value-td9241.html



2017-12-15 18:03 GMT+08:00 jxs <jxskiss@126.com>:

Hi,

Finally, I found this in org.apache.kylin.storage.hbase.HBaseResourceStore:

```
    private StorageURL buildMetadataUrl(KylinConfig kylinConfig) throws IOException {
        StorageURL url = kylinConfig.getMetadataUrl();
        if (!url.getScheme().equals("hbase"))
            throw new IOException("Cannot create HBaseResourceStore. Url not match. Url: "
+ url);

        // control timeout for prompt error report
        Map<String, String> newParams = new LinkedHashMap<>();
        newParams.put("hbase.client.scanner.timeout.period", "10000");
        newParams.put("hbase.rpc.timeout", "5000");
        newParams.put("hbase.client.retries.number", "1");
        newParams.putAll(url.getAllParameters());

        return url.copy(newParams);
    }
```

Is this related to the timeout error? Why these params are hard coded instead of reading from
configuration, is there any workaround for this timeout error?





在2017年12月15 16时03分, "jxs"<jxskiss@126.com>写道:


Hi, kylin users,

I encountered an strange timeout error today when buiding a cube.

By "strange", I mean the "hbase.rpc.timeout" configuration is set to 60000 in hbase, but I
get "org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=8099904, waitTime=5001, operationTimeout=5000
expired" errors.

Kylin version 2.2.0, runs on EMR, and it runs wihtout error for about half of a month, suddenly
it not work, the current cube is not the biggest one.
I am wondering where should I look, any help is appreciated.

The traceback from log:

```
2017-12-15 06:46:57,892 ERROR [Scheduler 2090031901 Job c9067736-eac7-48ad-88f3-dbd6f4e870ae-167]
execution.ExecutableManager:149 : fail to get job output:c9067736-eac7-48ad-88f3-dbd6f4e870ae-14
org.apache.kylin.job.exception.PersistentException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Failed after attempts=1, exceptions:
Fri Dec 15 14:46:57 GMT+08:00 2017, RpcRetryingCaller{globalStartTime=1513320412890, pause=100,
retries=1}, java.io.IOException: Call to ip-172-31-5-71.cn-north-1.compute.internal/172.31.5.71:16020
failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=8099904,
waitTime=5001, operationTimeout=5000 expired.

        at org.apache.kylin.job.dao.ExecutableDao.getJobOutput(ExecutableDao.java:202)
        at org.apache.kylin.job.execution.ExecutableManager.getOutput(ExecutableManager.java:145)
        at org.apache.kylin.job.execution.AbstractExecutable.getOutput(AbstractExecutable.java:312)
        at org.apache.kylin.job.execution.AbstractExecutable.isDiscarded(AbstractExecutable.java:392)
        at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:149)
        at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
        at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:64)
        at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
        at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:144)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=1,
exceptions:
Fri Dec 15 14:46:57 GMT+08:00 2017, RpcRetryingCaller{globalStartTime=1513320412890, pause=100,
retries=1}, java.io.IOException: Call to ip-172-31-5-71.cn-north-1.compute.internal/172.31.5.71:16020
failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=8099904,
waitTime=5001, operationTimeout=5000 expired.

```








Mime
View raw message