aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Khutornenko <ma...@apache.org>
Subject Re: [PROPOSAL] New RPC `fetchJobUpdates`
Date Fri, 02 Sep 2016 22:36:52 GMT
The pulseJobUpdate RPC was added as an entirely new feature. This
proposal suggests a read-only "convenience" method to pull data that
is already available via existing means.

As for "taking down the scheduler", this argument is moot. It's
possible to DDOS scheduler via existing RPCs today and the suggested
alternative has the same inherent risk of crafting an unscoped query
that will pull all updates from the store. Also, unless there are a
LOT of those queries (read dos-attack that OOMs scheduler) it's
unlikely that the scheduler will sustain any damage as read-only
queries don't acquire any global or table locks.

On Fri, Sep 2, 2016 at 3:24 PM, Zameer Manji <zmanji@apache.org> wrote:
> I'm not convinced by your argument about adding a read only RPC that's not
> covered by the traditional integration test is going to cause bit rot. We
> already have an RPC that is not covered by the traditional integration test
> `pulseJobUpdate` and it hasn't bit rotted AFAIK.
>
> Further some members of the community have been writing alternative clients
> around our API and I think adding this RPC will better support their use
> cases. `JobUpdate` is a first class struct in our API and I think it makes
> sense to expose a query interface for it.
>
> If integration tests are your primary concern, I can also add an
> integration test for this.
>
> Regarding adding `JobUpdateQuery` to `getJobUpdateDetails`, I'm no longer
> convinced that it's the best way to go because of the risk it contains.
> It's all to easy to craft a query that can pull a lot of data that can take
> down the scheduler. In my case, I sometimes see it being useful for getting
> all of the `JobUpdates` of a role (completed and active), but that risks
> pulling a lot of unnecessary data.
>
> I will also add that since `JobUpdate` is a first class struct in our
> storage and APIs, this RPC contains very minimal code. I think it's highly
> unlikely that it will bit rot.
>
> On Fri, Sep 2, 2016 at 2:52 PM, Maxim Khutornenko <maxim@apache.org> wrote:
>
>> As I mentioned in Slack, I am ok with the new RPC as long as there is
>> a use for it elsewhere in the client or UI. Adding a read-only RPC
>> that isn't going to be called by our traditional integration test
>> clients sets a fertile ground for bit rot.
>>
>> I am actually warming up to your original proposal of adding
>> JobUpdateQuery into the existing getJobUpdateDetails RPC. While it may
>> be more expensive to pull multiple updates, we don't necessarily risk
>> much after we migrated to MVStore on the H2 side. There are no table
>> locks acquired and the only downside would be pulling events along
>> with what you need. Provided the query is narrowly scoped, that should
>> deliver acceptable performance.
>>
>> On Thu, Sep 1, 2016 at 2:24 PM, Zameer Manji <zmanji@apache.org> wrote:
>> > Hey,
>> >
>> > I've noticed a hole in our current API which makes it difficult to write
>> > external clients and other tooling around fetching the state of updates.
>> >
>> > Currently, to fetch updates we are given two RPCs:
>> > ````
>> > /** Gets job update summaries. */
>> > Response getJobUpdateSummaries(1: JobUpdateQuery jobUpdateQuery)
>> >
>> > /** Gets job update details. */
>> > Response getJobUpdateDetails(1: JobUpdateKey key)
>> >
>> > ````
>> >
>> > The `getJobUpdateSummaries` RPC is not scoped to a single update and
>> > returns a
>> > set of `JobUpdateSummary` structs. The struct is defined:
>> > ````
>> > /** Summary of the job update including job key, user and current state.
>> */
>> > struct JobUpdateSummary {
>> >   /** Unique identifier for the update. */
>> >   5: JobUpdateKey key
>> >
>> >   /** User initiated an update. */
>> >   3: string user
>> >
>> >   /** Current job update state. */
>> >   4: JobUpdateState state
>> > }
>> > ````
>> >
>> > The `getJobUpdateDetails` RPC is scoped to a single update and returns
>> the
>> > following struct:
>> >
>> > ````
>> > struct JobUpdateDetails {
>> >   /** Update definition. */
>> >   1: JobUpdate update
>> >
>> >   /** History for this update. */
>> >   2: list<JobUpdateEvent> updateEvents
>> >
>> >   /** History for the individual instances updated. */
>> >   3: list<JobInstanceUpdateEvent> instanceEvents
>> > }
>> >
>> > ````
>> >
>> > Maxim mentioned to me yesterday that this RPC is scoped to a single
>> update
>> > because assembling the `instanceEvents` can be extremely expensive. A
>> query
>> > that
>> > could span more than a single update risks taking down the scheduler in a
>> > large
>> > cluster.
>> >
>> >
>> > The problem I discovered is that there is no batch API to get the
>> > inexpensive
>> > information inside the `JobUpdate` struct. For reference this struct
>> > contains:
>> >
>> > ````
>> > /** Full definition of the job update. */
>> > struct JobUpdate {
>> >   /** Update summary. */
>> >   1: JobUpdateSummary summary
>> >
>> >   /** Update configuration. */
>> >   2: JobUpdateInstructions instructions
>> > }
>> > ````
>> >
>> > Consumers are forced to make several `getJobUpdateDetails` calls to get
>> > multiple
>> > `JobUpdate` structs. Since the `JobUpdate` struct is not expensive to
>> > assemble,
>> > I'm proposing a new RPC that will allow consumers to get several
>> `JobUpdate`
>> > structs in a single call.
>> >
>> > ````
>> > /** Gets job updates. */
>> > Response getJobUpdates(1: JobUpdateQuery jobUpdateQuery)
>> > ````
>> >
>> > If there are no objections, I will file tickets and put up a patch to
>> > implement
>> > this.
>> >
>> > --
>> > Zameer Manji
>>

Mime
View raw message