kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prashant Prakash <prash.i...@gmail.com>
Subject Re: Performance degradation in cube building.
Date Thu, 28 Jan 2016 03:19:46 GMT
Hi Luke,

One of the key requirements of our project was to keep query time low and
provide consistent query performance. To achieve it we had disabled major
compactions on hbase, and set minor compaction thresholds to high value
such that they were virtually disabled. The assumption behind it was, in
Kylin every cube segment is added in as a new table. Since we don't update
the cube segment once built the number of store files underlying the table
remains constant, thus providing consistent query performance. Please
correct me if the assumption is flawed.

Its altogether different case for Kylin metadata table. New records get
added with every cube build job, and number of store files keep increasing.
The default job scheduler scans all the jobs in metadata table. The query
performance degraded with addition of new store file.

The deletion of old jobs, reduced the number of records to be scanned, but
as it is again modification, I think it would have hurt us in long run. We
did compaction which resolved the problem.

Regards
Prashant

On Wed, Jan 27, 2016 at 5:31 PM, Luke Han <luke.hq@gmail.com> wrote:

> Hi Prashant,
>     Could you give more detail about the root cause and how you fix it?
>     I think it will benefit others when they run to same issue later.
>
>     Thanks.
> Luke
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Wed, Jan 27, 2016 at 7:57 PM, Prashant Prakash <prash.iitk@gmail.com>
> wrote:
>
>> The metadata clean up helped partially.
>>
>> @ShaoFeng, as suspected the root cause was not related to kylin. We tried
>> to tune HBASE, which eventually helped in resolving the issue.
>>
>> Regards
>> Prashant
>>
>> On Mon, Jan 25, 2016 at 11:02 AM, ShaoFeng Shi <shaofengshi@apache.org>
>> wrote:
>>
>>> The "Other" jobs here means those inactive jobs, including completed and
>>> discarded. As time goes on, the number increases. If you found the "Jobs"
>>> tab on web UI loading is very slow, you'd better do a cleanup; The job
>>> engine is doing the same thing, retrieving the jobs from HBase. I'm not
>>> sure whether that is the root cause in your case, because 20 ~ 30 minutes
>>> is too long a time for loading 500 records, but it worth a try.
>>>
>>> From v1.2, run "./bin/metadata clean --delete true" will also drop those
>>> old inactive jobs (old than 30 days I remeber). Before doing that, take a
>>> metadata backup is a must, please check
>>> https://kylin.apache.org/docs/howto/howto_backup_metadata.html
>>>
>>> For v1.1, I'm afraid there is no simple way to cleanup the old jobs.
>>>
>>> 2016-01-25 1:10 GMT+08:00 Prashant Prakash <prash.iitk@gmail.com>:
>>>
>>>> Hi,
>>>>
>>>> We are running kylin v1.1.1 on our cluster. Over the time we have
>>>> observed degradation in cube building process. By debugging we have
>>>> confirmed that all the Executable, are running fine, the issue lies in
>>>> DefaultScheduler.FetcherRunner. By default FetcherRunner is scheduled
>>>> to run at every 60 sec, but in our case the lag is ~ 20 - 30 min.
>>>>
>>>> [kylin@ec2-184-169-138-213 logs]$ tail -1000f kylin_job.log | grep
>>>> "Job Fetcher"
>>>>                                                                   [pool-6-thread-1]:[2016-01-24
>>>> *06:12:53*,465][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
>>>> - Job Fetcher: 0 running, 1 actual running, 4 ready, 533 others
>>>>
>>>> [pool-6-thread-1]:[2016-01-24 *06:34:15*,518][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
>>>> - Job Fetcher: 0 running, 1 actual running, 4 ready, 533 others
>>>>
>>>> [pool-6-thread-1]:[2016-01-24 *06:55:46*,133][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
>>>> - Job Fetcher: 0 running, 1 actual running, 4 ready, 533
>>>> others[pool-6-thread-1]:[2016-01-24
>>>> 07:18:52,040][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
>>>> - Job Fetcher: 0 running, 1 actual running, 4 ready, 533 others
>>>>
>>>> *1. *One key number in the log is *533 others, *which I suspect mostly
>>>> contains jobs with state "SUCCEED" (Have not verified). The reason
>>>> behind the guess is that we are not deleting the jobs after they get
>>>> completed. There is a method  ExecutableManager.deleteJob but as of
>>>> now it is being called only from Test module.
>>>>
>>>> *2. *Over time we have observed the number in *Others* column grow.
>>>> From thread dump it looks like FetcherRunner most of the time is
>>>> fetching jobOutput.
>>>>
>>>> Because of above the two reason I suspect build of jobs with state "
>>>> SUCCEED"/ DISCARDED" might be the reason behind performance
>>>> degradation.
>>>>
>>>> Please provide feedback on wether my hypothesis have some merit.
>>>>
>>>>
>>>> *Thread dump:*
>>>>
>>>> "pool-6-thread-1" #819 prio=5 os_prio=0 tid=0x00007fe264591000
>>>> nid=0x25c9 in Object.wait() [0x00007fe237a9d000]
>>>>
>>>>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>>>
>>>>         at java.lang.Object.wait(Native Method)
>>>>
>>>>         at java.lang.Object.wait(Object.java:460)
>>>>
>>>>         at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>>>>
>>>>         at
>>>> org.apache.hadoop.hbase.client.ResultBoundedCompletionService.poll(ResultBoundedCompletionService.java:155)
>>>>
>>>>         - locked <0x00000003d8026808> (a
>>>> [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;)
>>>>
>>>>         at
>>>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:168)
>>>>
>>>>         at
>>>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:57)
>>>>
>>>>         at
>>>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>>>>
>>>>         at
>>>> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:293)
>>>>
>>>>         at
>>>> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:393)
>>>>
>>>>         at
>>>> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:337)
>>>>
>>>>         at
>>>> org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:94)
>>>>
>>>>         at
>>>> org.apache.kylin.common.persistence.HBaseResourceStore.getByScan(HBaseResourceStore.java:283)
>>>>
>>>>         at
>>>> org.apache.kylin.common.persistence.HBaseResourceStore.getResourceImpl(HBaseResourceStore.java:201)
>>>>
>>>>         at
>>>> org.apache.kylin.common.persistence.ResourceStore.getResource(ResourceStore.java:130)
>>>>
>>>>         at
>>>> org.apache.kylin.job.dao.ExecutableDao.readJobOutputResource(ExecutableDao.java:90)
>>>>
>>>>         at
>>>> org.apache.kylin.job.dao.ExecutableDao.getJobOutput(ExecutableDao.java:179)
>>>>
>>>>         at
>>>> org.apache.kylin.job.manager.ExecutableManager.getOutput(ExecutableManager.java:119)
>>>>
>>>>         at org.apache.kylin.job.impl.threadpool.DefaultScheduler$
>>>> FetcherRunner.run(DefaultScheduler.java:93)
>>>>
>>>>         - locked <0x0000000482aa9c98> (a
>>>> org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner)
>>>>
>>>>         at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>
>>>>         at
>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>>>
>>>>         at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>>>
>>>>         at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>>>
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>
>>>>
>>>> Regards
>>>>
>>>> Prashant
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi
>>>
>>>
>>
>

Mime
View raw message