phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samarth Jain (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PHOENIX-3525) Cap automatic index rebuilding to inactive timestamp.
Date Thu, 20 Jul 2017 20:12:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095276#comment-16095276
] 

Samarth Jain edited comment on PHOENIX-3525 at 7/20/17 8:11 PM:
----------------------------------------------------------------

Work in progress patch. [~jamestaylor], please review. 

Summary of changes:
- I changed the lock acquisition strategy to wait for the lock in updateIndexState(). This
is important I think and following is my reasoning: when index is getting disabled then it
is possible that multiple region servers are trying to update the index state. In that case,
only one region server will be able to acquire the lock. If we don't wait for the lock then
the race losing region servers will get an exception and they will end up aborting the region
server. This could cause cluster wide region server aborts if for some reason the code being
executed under the lock takes a long time.

- In MetadataEndPointImpl I am setting the upper bound of the scan *only* when the index state
is being switched from disable to inactive. I am setting the upper bound timestamp either
when it is not set or when the index disable timestamp is newer than the existing upper bound.
As far was protocol changes are concerned I think we are fine backward compatibility wise
since it is all proto-bufed.

- I am a little confused about how to handle the scanEndTime when we are check pointing. For
now, I am always using the batch timestamp returned by 

{code}
getTimestampForBatch(timeStamp,
                                        batchExecutedPerTableMap.get(dataPTable.getName()));
{code}

as the scan end time. It the batchTime is HConstants.LATEST_TIMESTAMP then I am using the
timestamp returned by updateIndexState call.




was (Author: samarthjain):
Work in progress patch. [~jamestaylor], please review. 

Summary of changes:
- I changed the lock acquisition strategy to wait for the lock in updateIndexState(). This
is important I think and following is my reasoning: when index is getting disabled then it
is possible that multiple region servers are trying to update the index state. In that case,
only one region server will be able to acquire the lock. If we don't wait for the lock then
the race losing region servers will get an exception and they will end up aborting the region
server. This could cause cluster wide region server aborts if for some reason the code being
executed under the lock takes a long time.

- In MetadataEndPointImpl I am setting the upper bound of the scan *only* when the index state
is being switched from disable to inactive. I am setting the upper bound timestamp either
when it is not set or when the index disable timestamp is newer than the existing upper bound.
As far was protocol changes are concerned I think we are fine backward compatibility wise
since it is all proto-bufed.

- I am a little confused about how to handle the scanEndTime when we are check pointing. For
now, I am always using the batch timestamp returned by 

{code}
getTimestampForBatch(timeStamp,
                                        batchExecutedPerTableMap.get(dataPTable.getName()));
{code}

as the scan end time that timestamp is HConstants.LATEST_TIMESTAMP in which case I am using
the timestamp returned by updateIndexState call.



> Cap automatic index rebuilding to inactive timestamp.
> -----------------------------------------------------
>
>                 Key: PHOENIX-3525
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3525
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Ankit Singhal
>         Attachments: PHOENIX-3525_wip.patch
>
>
> From [~chrajeshbabu32@gmail.com] review comment on 
> https://github.com/apache/phoenix/pull/210
> For automatic rebuilding ,DISABLED_TIMESTAMP is lower bound but there is no upper bound
so we are going rebuild all the new writes written after DISABLED_TIMESTAMP even though indexes
updated properly. So we can introduce an upper bound of time where we are going to start a
rebuild thread so we can limit the data to rebuild. In case If there are frequent writes then
we can increment the rebuild period exponentially



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message