kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Yang <liy...@apache.org>
Subject Re: How to clean up after Kylin 2.0
Date Fri, 26 May 2017 10:36:15 GMT
A more detailed list of leftover files under the job folder will help.

However IT IS NORMAL for the below folder to exist:
/kylin/kylin_metadata/JOB_ID/CUBE_NAME/cuboid

It holds a copy of cube data. It is needed if you later want to merge the
segments. And if you are sure the segments won't merge later, it is safe to
delete.

On Wed, May 17, 2017 at 7:48 PM, Itay Shwartz <itay.shwartz@structureit.net>
wrote:

> Thank you very much for your answer, Billy.
>
> We're currently experiencing it on any cube (Even when creating a new one)
> so I imagine this "buggy" state got created at some point in the last 9
> months since we started using Kylin. In order for it to be effective, what
> kind of data would you like me to provide to help you reproduce the issue
> on your end?
>
> Cheers,
> Itay
>
> -----
> Itay Shwartz
>
> StructureIt
> 6th Floor
> Aldgate Tower
> 2 Leman Street
> London
> E1 8FA
>
> direct line: +44 (0)20 3286 9902
> mobile: +44 (0)74 1123 6614
> www.structureit.net
>
>
> On 17 May 2017 at 03:50, Billy Liu <billyliu@apache.org> wrote:
>
>> Thanks Itay for raising this question.
>>
>> When you rebuild the cube, the old segment will be invalid for query, but
>> available for cleanup. The StorageCleanupJob should clean those files, but
>> if not, that may be an issue. Could you log a JIRA for this issue and
>> describe how to reproduce it? That will help community to fix it a.s.a.p.
>>
>> Kylin will save duplicate cube data on both HDFS and HBase. The HBase one
>> is used for query, the HDFS one is used for later segment merge. If no
>> merged needed, it's safe to delete it manually.
>>
>> 2017-05-17 0:55 GMT+08:00 Itay Shwartz <itay.shwartz@structureit.net>:
>>
>>> Hi,
>>>
>>> I work on a project where we build a cube multiple times a day using
>>> Kylin. We were using Kylin 1.6 and upgraded this week to Kylin 2.0.
>>>
>>> Since the upgrade I noticed that the HDFS usage had increased every time
>>> we rebuild the cube and the space is not cleared up. This is although we
>>> run both the StorageCleanupJob and metastore clean command as described
>>> here and here.
>>>
>>> When looking into HDFS to see where the increase is I see that the
>>> accumulated data is at: /kylin/kylin_metadata/
>>>
>>> It looks like every job is getting a new folder inside that folder and
>>> its size is at least the same as the size of the cube. Seems like some of
>>> these folders were not cleared even for very old jobs but since the upgrade
>>> to V2.0 all the folders for all jobs were not cleared. I deleted some of
>>> the older folders and it didn't affect the cube. I also created a test cube
>>> and then deleted the folder that was created for it and could still query
>>> the cube. Is it safe to delete these folders manually? Is it correct to
>>> assume that after the job is done all the data that needs to be maintained
>>> will be in HBase (Where I can find the cube and the metadata information)?
>>>
>>>
>>> Many thanks,
>>>
>>> Itay
>>>
>>> -----
>>> Itay Shwartz
>>>
>>> StructureIt
>>> 6th Floor
>>> Aldgate Tower
>>> 2 Leman Street
>>> London
>>> E1 8FA
>>>
>>> direct line: +44 (0)20 3286 9902
>>> mobile: +44 (0)74 1123 6614
>>> www.structureit.net
>>>
>>>
>>
>

Mime
View raw message