asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wail Alkowaileet (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ASTERIXDB-1698) Secondary index doesn't follow the compaction policy
Date Wed, 19 Oct 2016 08:18:58 GMT

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588022#comment-15588022
] 

Wail Alkowaileet commented on ASTERIXDB-1698:
---------------------------------------------

I had a discussion with Sattam about this. I don't think it's a bug but unimplemented logic
of that specific case.
In the normal case of an LSM index, it will seek the opportunity to have one disk component
as much as possible. So when you create a secondary index, it will use the bulk loader for
building the index.

> Secondary index doesn't follow the compaction policy
> ----------------------------------------------------
>
>                 Key: ASTERIXDB-1698
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1698
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: Storage
>         Environment: master : 4819ea44723b87a68406d248782861cf6e5d3305
>            Reporter: Jianfeng Jia
>            Assignee: Ian Maxon
>
> Here is the ddl for the dataset:
> {code}
> create dataset ds_tweet(typeTweet) if not exists primary key id using compaction policy
prefix (("max-mergable-component-size"="134217728"),("max-tolerance-component-count"="10"))
with filter on create_at ;
> create index text_idx if not exists on ds_tweet("text") type keyword;
> {code}
> In this case, I want to create a smaller component around 128M. During the data ingestion
phase, it works well, and the size of each text_idx component is also small (~80M each). I
assume it also followed the component size constraint? 
> After ingestion, I found that I needed to build another index, 
> {code}
> create index time_idx if not exists on ds_tweet(create_at) type btree;
> {code}
> When it finished, I found that this time_idx didn't follow the constraint and ended up
with one giant 1.2G component on each partition. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message