apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandni Singh <chan...@datatorrent.com>
Subject Re: Fault-tolerant cache backed by a store
Date Sat, 28 Nov 2015 17:42:42 GMT
Another approach is to treat an entry in bucket data file as:

Time can be extract from the tuple (or windowId can be used).
With this approach purging can be simple. For each Bucket Data File we
check the last entry (since data is sorted in the bucket data file) and
delete if that is expired.

Writing to bucket data file can be simple. We will not update value of a
key, but always add a new entry for the key when its value changes.
Cons- multiple entries for a key.
If the tuples are not out of order then we may never have to re-write a
bucket data file that is complete.

Reading is a problem here. The whole bucket needs to be de-serialized to
find a key since data is no longer sorted on disk. If the query for a key
specifies a time range then that read can be optimized.

With Tim's approach, purging can be triggered asynchronously at regular
intervals that may even delete data file which hasn't been updated for
sometime and the latest entry in that file is expired.
Even though the writes may not be that complicated with this approach but
updating values when the length of value changes (example in join operation
a value is an appending list)  may result in many small stray files.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message