aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Sweeney" <kevi...@apache.org>
Subject Re: Review Request 26478: Add a flag to deduplicate storage snapshots
Date Wed, 15 Oct 2014 01:32:40 GMT


> On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java, line
86
> > <https://reviews.apache.org/r/26478/diff/2/?file=716376#file716376line86>
> >
> >     Method interceptors should work fine for package-priviate methods [1].  Just
for my understanding - are you going with protected because package-private class + protected
method is slightly more restrictive?
> >     
> >     [1] https://github.com/google/guice/wiki/AOP#limitations

I'm going with protected because it's more indicative of why it's visible at all (I want guice's
dynamic proxy subclass to override it and add timing information). It's also more restrictive
than package-private in this context.


> On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
line 61
> > <https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line61>
> >
> >     Tasks.SCHEDULED_TO_INFO

This operates on the mutable structures, SCHEDULED_TO_INFO operates on the immutable ones.


> On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
line 71
> > <https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line71>
> >
> >     Do you have numbers on how much time this routine saves when compared to a full
deep copy and unsetting the field you're trying to clear?  Unless it's a significant contributor
to overall snapshot performance, i'd prefer the more concise code of the latter approach.
> >     
> >     My hunch is that this one might save O(100 ms), but the ones below are noise
and not worth the code.

I don't have data for this specific optimization - my gut is that we should avoid deepCopy
on Snapshots due to them respresenting essentially the entire scheduler heap. Happy to remove
if you think it's not warranted.


> On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
> > src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java,
line 84
> > <https://reviews.apache.org/r/26478/diff/2/?file=716386#file716386line84>
> >
> >     The assertion error will be much more useful if you do `assertEquals(emptySet,
b)` rather than `assertEquals(n, b.size())`.

It's actually `null` here, but `.size()` in thrift reports `0` for `null`. Changed to explicit
`assertNull`.


> On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
> > src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java,
line 89
> > <https://reviews.apache.org/r/26478/diff/2/?file=716386#file716386line89>
> >
> >     Isn't this check redundant to the one below?

not quite, since the latter is making a Set out of a List, and therefore potentially doing
deduplication for me.


> On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
> > src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java,
line 98
> > <https://reviews.apache.org/r/26478/diff/2/?file=716386#file716386line98>
> >
> >     Can you try for `assertEquals` here as well, with an expected list of `DeduplicatedScheduledTask`
objects?  It will catch classes of bugs missed with this check (e.g. extra entries in the
`partialTasks` list), and make it easier to diagnose a failed test.

Since the input is a Snapshot (Set<ScheduledTask>) it's not possible to know the expected
order of the output list ahead-of-time. Added an explicit check against extra entries.


> On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java, line
80
> > <https://reviews.apache.org/r/26478/diff/2/?file=716379#file716379line80>
> >
> >     First sentence contains mixed thoughts for an odd sentence, rewrite?

Rewrote.


> On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java, line
81
> > <https://reviews.apache.org/r/26478/diff/2/?file=716379#file716379line81>
> >
> >     Please make this more verbose to explain what exactly is backwards in compatible.
 Don't worry about being too wordy.
> >     
> >     I'd also like to see a TODO to remove this flag, since it should become the
default.  Perhaps this should be included in the 0.7.0 deprecations list.

Moved to separate doc.


> On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
line 110
> > <https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line110>
> >
> >     Inline this on :114.

done


> On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
line 116
> > <https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line116>
> >
> >     It would be nice to see a brief comment here to give the reader an overview
of what's going on in the routine.

Added.


> On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
line 157
> > <https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line157>
> >
> >     remove newline

removed


> On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
> > src/main/thrift/org/apache/aurora/gen/storage.thrift, line 202
> > <https://reviews.apache.org/r/26478/diff/2/?file=716383#file716383line202>
> >
> >     The mention of taskConfigId is vague.  There should be a doc somewhere indicating
that `taskConfigId` is referencing the _index_ of an entry in `taskConfigs`.

Made more explicit


> On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
> > src/main/thrift/org/apache/aurora/gen/storage.thrift, line 215
> > <https://reviews.apache.org/r/26478/diff/2/?file=716383#file716383line215>
> >
> >     remove newline

Fixed.


> On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
> > src/main/thrift/org/apache/aurora/gen/storage.thrift, line 237
> > <https://reviews.apache.org/r/26478/diff/2/?file=716383#file716383line237>
> >
> >     Please doc.

Done


- Kevin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/#review55986
-----------------------------------------------------------


On Oct. 14, 2014, 6:32 p.m., Kevin Sweeney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26478/
> -----------------------------------------------------------
> 
> (Updated Oct. 14, 2014, 6:32 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
> 
> 
> Bugs: AURORA-722
>     https://issues.apache.org/jira/browse/AURORA-722
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication
ratio on Twitter's production snapshots.
> 
> This format is backwards-incompatible, so this patch introduces a flag to control its
use (defaulting off).
> 
> This only changes the format used to write to the replicated log (where time is of the
essence since all writes are done holding the global storage lock) - the format of backups
written to disk is unchanged, as backups don't hold the lock.
> 
> 
> Diffs
> -----
> 
>   config/legacy_untested_classes.txt 3af99867eb25a7e44bb3520e82b1def125bd6e15 
>   docs/scheduler-storage.md PRE-CREATION 
>   src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 65e986eaa2c4193431ca048425a1ed3ab60f5882

>   src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 7239a6a5eb5479e395e16423c83fdf80a77e5a83

>   src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 4b50e2069407dc263b4fc93f1827d3a8836253bf

>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java f806297d1d0700155c976743f936b2b8a3a390fb

>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 769348e6b8a5c701734afff391b1c77de35222c6

>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java PRE-CREATION

>   src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 22db80eaf34fe736fa5a3a9289836c9ac9e59906

>   src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a

>   src/main/thrift/org/apache/aurora/gen/storage.thrift 5350ec945fbe028ee4641683815a068ce00b5efc

>   src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 39729b374fe4e383f9b5ada7d016923766df9af7

>   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 7a8c3b882633376a1bf6a78616d55aaa7401d13f

>   src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/26478/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message