hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Kyle Purtell (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-25913) Introduce EnvironmentEdge.currentTimeAdvancing
Date Tue, 25 May 2021 01:31:00 GMT

     [ https://issues.apache.org/jira/browse/HBASE-25913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Kyle Purtell updated HBASE-25913:
----------------------------------------
    Description: 
Introduce new {{EnvironmentEdge#currentTimeAdvancing}} which ensures that when the current
time is returned, it is the current time in a different clock tick from the last time the {{EnvironmentEdge}}
was used to get the current time.

Use {{EnvironmentEdge#currentTimeAdvancing}} wherever we go to substitute a {{Long.MAX_VALUE}}
timestamp placeholder with a real placeholder just before committing the mutation. When processing
a batch of mutations (doMiniBatchMutation etc) we will call {{currentTimeAdvancing}} only
once. This means the client cannot bundle cells with wildcard timestamps into a batch where
those cells must be committed with different timestamps. Clients must simply not submit mutations
that must be committed with guaranteed distinct timestamps in the same batch. Easy to understand,
easy to document, and it aligns with our design philosophy of the client knows best.

It is not required to handle batches as proposed. We could guarantee a distinct timestamp
for every mutation in a batch. Count the number of mutations, call this M. Get the current
time. Then, wait for at least M milliseconds. Then, set the first mutation timestamp with
this value and increment by 1 for all remaining. I don't think this is necessary. See reasoning
in above paragraph. Mentioned here for sake of discussion. 

It will be fine to continue to use {{EnvironmentEdge#currentTime}} everywhere else. In this
way we will only potentially spin wait where it matters, and won't suffer serious overheads
during batch processing.

  was:
Introduce new {{EnvironmentEdge#currentTimeAdvancing}} which ensures that when the current
time is returned, it is the current time in a different clock tick from the last time the {{EnvironmentEdge}}
was used to get the current time.

Use {{EnvironmentEdge#currentTimeAdvancing}} wherever we go to substitute a {{Long.MAX_VALUE}}
timestamp placeholder with a real placeholder just before committing the mutation. When processing
a batch of mutations (doMiniBatchMutation etc) we will call {{currentTimeAdvancing}} only
once. This means the client cannot bundle cells with wildcard timestamps into a batch where
those cells must be committed with different timestamps. Clients must simply not submit mutations
that must be committed with guaranteed distinct timestamps in the same batch. Easy to understand,
easy to document, and it aligns with our design philosophy of the client knows best. 

It is not required to handle batches as proposed; we could guarantee a distinct timestamp
for every mutation in the batch. Count the number of mutations, call this M. Get the current
time. Set the first mutation timestamp with this value and increment by 1 for all remaining.
Then, wait for at least M milliseconds. I don't think this is necessary. See reasoning in
above paragraph. Mentioned here for sake of discussion.

It will be fine to continue to use {{EnvironmentEdge#currentTime}} everywhere else. In this
way we will only potentially spin wait where it matters, and won't suffer serious overheads
during batch processing.


> Introduce EnvironmentEdge.currentTimeAdvancing
> ----------------------------------------------
>
>                 Key: HBASE-25913
>                 URL: https://issues.apache.org/jira/browse/HBASE-25913
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> Introduce new {{EnvironmentEdge#currentTimeAdvancing}} which ensures that when the current
time is returned, it is the current time in a different clock tick from the last time the {{EnvironmentEdge}}
was used to get the current time.
> Use {{EnvironmentEdge#currentTimeAdvancing}} wherever we go to substitute a {{Long.MAX_VALUE}}
timestamp placeholder with a real placeholder just before committing the mutation. When processing
a batch of mutations (doMiniBatchMutation etc) we will call {{currentTimeAdvancing}} only
once. This means the client cannot bundle cells with wildcard timestamps into a batch where
those cells must be committed with different timestamps. Clients must simply not submit mutations
that must be committed with guaranteed distinct timestamps in the same batch. Easy to understand,
easy to document, and it aligns with our design philosophy of the client knows best.
> It is not required to handle batches as proposed. We could guarantee a distinct timestamp
for every mutation in a batch. Count the number of mutations, call this M. Get the current
time. Then, wait for at least M milliseconds. Then, set the first mutation timestamp with
this value and increment by 1 for all remaining. I don't think this is necessary. See reasoning
in above paragraph. Mentioned here for sake of discussion. 
> It will be fine to continue to use {{EnvironmentEdge#currentTime}} everywhere else.
In this way we will only potentially spin wait where it matters, and won't suffer serious
overheads during batch processing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message