phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PHOENIX-3525) Cap automatic index rebuilding to inactive timestamp.
Date Fri, 04 Aug 2017 17:03:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113128#comment-16113128
] 

James Taylor edited comment on PHOENIX-3525 at 8/4/17 5:02 PM:
---------------------------------------------------------------

Current plan is to eliminate simultaneous writes from the rebuilder and clients to prevent
any race conditions by:
# introducing an INDEX_ACTIVATE_TIMESTAMP column that determines when incremental indexing
will begin again. This timestamp will be set by the rebuilder to a time in the future (by
a configurable delta) after all index regions are online.
# the INDEX_ACTIVATE_TIMESTAMP will be cleared when the INDEX_DISABLED_TIMESTAMP is set in
the MetaDataEndPointImpl.setIndexState call. The rebuilder would then reset it according to
the logic in (1), moving it out to a later time.
# the INDEX_ACTIVATE_TIMESTAMP will act as an upper bound on the rebuilder scan that replays
mutations. Only after this timestamp plus some delta passes  (and the replaying is complete)
will an index be marked as ACTIVE and the INDEX_ACTIVATE_TIMESTAMP and INDEX_DISABLED_TIMESTAMP
be cleared.
# index maintenance will be prevented while server-based timestamp < INDEX_ACTIVATE_TIMESTAMP
by having the clients not send the IndexMaintainer . The INDEX_ACTIVATE_TIMESTAMP will be
included in PTable so that it makes its way to the clients.



was (Author: jamestaylor):
Current plan is to eliminate simultaneous writes from the rebuilder and clients to prevent
any race conditions by:
* introducing a PENDING_ACTIVE index state. When in PENDING_ACTIVE state, an index will not
be used by queries until the server-based timestamp >= INDEX_ACTIVATE_TIMESTAMP.
* introducing an INDEX_ACTIVATE_TIMESTAMP column that determines when an index will be reactivated.
This timestamp will be set by the rebuilder to a time in the future (by a configurable amount
of time) after all index regions are online. The index will be put either left in an ACTIVE
state (depending on config) or moved to a PENDING_ACTIVE state.
* prevent index maintenance by not sending IndexMaintainer until server-based timestamp >=
INDEX_ACTIVATE_TIMESTAMP.
* include INDEX_ACTIVATE_TIMESTAMP in PTable so that clients can use it to control whether
index maintenance is performed.


> Cap automatic index rebuilding to inactive timestamp.
> -----------------------------------------------------
>
>                 Key: PHOENIX-3525
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3525
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Ankit Singhal
>            Assignee: James Taylor
>         Attachments: PHOENIX-3525_wip2.patch, PHOENIX-3525_wip.patch
>
>
> From [~chrajeshbabu32@gmail.com] review comment on 
> https://github.com/apache/phoenix/pull/210
> For automatic rebuilding ,DISABLED_TIMESTAMP is lower bound but there is no upper bound
so we are going rebuild all the new writes written after DISABLED_TIMESTAMP even though indexes
updated properly. So we can introduce an upper bound of time where we are going to start a
rebuild thread so we can limit the data to rebuild. In case If there are frequent writes then
we can increment the rebuild period exponentially



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message