flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [flink] curcur edited a comment on pull request #16606: [FLINK-21357][runtime/statebackend]Periodic materialization for generalized incremental checkpoints
Date Tue, 14 Sep 2021 06:07:18 GMT

curcur edited a comment on pull request #16606:
URL: https://github.com/apache/flink/pull/16606#issuecomment-917898103


   Roman and I had several long discussions on interfaces between Materialization and [`ChangelogKeyedStatebackend`](https://github.com/apache/flink/commit/3421b81c2502f61112bd131a7336c16e3dd30925#diff-e071e8a89527c24be4ee5ee342ad7d47c870170ef915d1407d18e998f7847f16L108).
Document here for future reference.
   
   The main difference is between who is responsible to **keep and update** `ChangelogKeyedStatebackend`'s
related states, denoted as [`ChangelogSnapshotState`](https://github.com/apache/flink/commit/3421b81c2502f61112bd131a7336c16e3dd30925#diff-79beab2a7108881b64ac4b482a6446e06623efa7e19ac4b0018c7cf20c35e88aR39)
including three parts:
   
    - materialized snapshot from the underlying delegated state backend
    - non-materialized part in the current changelog
    - non-materialized changelog, from previous logs (before failover or rescaling) 
   
   We've discussed and tried out three versions:
   
   1. `Materialization` coupled with `ChangelogKeyedStatebackend`, 
   implemented in commit **fbd1e2d38ae6353506ceac8eb074bd24bdb29b62**
   	Where `PeriodicMaterializer` is an inner class of `ChangelogKeyedStatebackend`
   	- Pros: states are shared, easy to reason about
   	- Cons: Coupled too closely, not flexible or extendible for keyedstatebackend or materializer
   
   	Not to mention further, this approach is discarded during early discussion.
   	
   2. `ChangelogSnapshotState` are kept in materializer. Materializer is conceptually taken
as a way to connect delegated state backend to changelog. How to connect: through `ChangelogSnapshotState`,
as denoted above.
   implemented in commit **3421b81c2502f61112bd131a7336c16e3dd30925**
   
       - Pros: 
         1. Good isolation and extensibility. Clear view the changelog keyedstatebackend as
four parts: 
   	    - log writer, delegated statebackend, materializer, and wrapper changelogkeyedstatebackend
for double writing
         2. More natural to understand and implement.
   	    - State is updated by the materializer, and accessible by changelogKeyedStateBackend
   	    - Materializer is part of ChangelogKeyedStateBackend
   
       - Cons: 
   	  1. according to Roman, ChangelogKeyedStateBackend has implicit states (like state double
writes) besides the three mentioned above; 
   	  2. optimization (like batched writes) need to update materilizer as well
   
   3. `ChangelogSnapshotState` and its updates are kept in ChangelogKeyedStatBackend. Materialization
works as a stateless Materialization Manager providing function utilities.
   Implemented as commit **75dec43024d91b896d488a4c9e979d486228398a**
       - Pros:
   	  1. All states are wrapped in ChangelogKeyedStatBackend
   	  2. Conceptually also works naturally
   	  
       - Cons:
   	 Circular constructor. `Materialization Manager` needs access to `ChangelogKeyedStatBackend`
to update `ChangelogSnapshotState`
   	 `ChangelogKeyedStatBackend` is created from StateBackend#createKeyedStateBackend. 
   	  
   	  To avoid circular construction, `Materialization Manager` has to be exposed at the time
creating ChangelogKeyedStatBackend. 
   
   @rkhachatryan what do you think Roman?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message