www-apachecon-discuss mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Gruno <humbed...@apache.org>
Subject Re: Some feedback on the new review mechanism
Date Thu, 29 Sep 2016 07:55:54 GMT
I'll see if I can answer some of the critique here.
And I'll also note that critique - whether it be positive or negative -
is most welcome.

On 09/29/2016 09:13 AM, Jan Willem Janssen wrote:
> Hi,
> 
>> On 29 Sep 2016, at 08:48, Christofer Dutz <christofer.dutz@c-ware.de> wrote:
>> I just wanted to also take the opportunity to give some feedback on the modified
review process:
>>
>>
>> 1. Seeing that I would have to do 30000 decisions sort of turned me off right away
(It's sort of ... "yeah let me help" and then getting a huge pile of work dumped on my desk)
>>
>> 2. With that huge amount of possible work I could see only little progress for quite
some time put into it) ... 30000 decisions would require reading of 60000 applications. If
I assume 30 seconds per application that's about 500 hours which is about 20 days without
doing anything else. I sort of quit at about 400 decisions.
>>
>> 3. I noticed for myself that at first you start reading the applications carefully
but that accuracy goes down very fast as soon as you get a lot of the talks you reviewed earlier
... unfortunately even if you only think you read it before. I noticed me not reading some
similar looking applications and voting for one thinking it's the other. Don't know if this
is desirable.
>>
>> I liked the simple interface however. So how about dropping the Deathmatch approach
and just displaying one application, and let the user select how much he likes it (ok ...
this is just the way the old version worked, but as I said, I liked the UI ... just clicking
once) ... eventually the user could also add tags to the application and suggest tracks.

The comparison review was chosen over the one-by-one review after a
lengthy discussion.

No one is/was expected to do all 30k combinations, that would take ages.
The system was designed to randomize the combinations, so that, given
enough time and reviewers, all combinations will be tested on the whole.
We had 32,000 scores submitted (between 50 and 70 per talk), which is
more detailed than what we would have gotten if we had set out to review
each talk on their own (then we would have gotten some 10-15 absolute
scores instead of the 40-70 relative ones) AND it gives us some
statistical advantages.

With the old style review, the majority of talks were sadly all given
pretty much identical scores, which made it very difficult to sort them.
We essentially had no way of knowing "okay, but if you had to choose
between these two identically scoring talks, which would you attend??".
By doing a 'death match' as you say, we can better work out the
likelihood that talks are preferred over other talks, and not only sort
by their total average score, but also the likelihood that talk A will
be rated better than talk B - even if that combination hasn't been
reviewed by you. This gives us additional data to sort and filter by. It
also helps refine scores. With absolute scoring, you have to score 1,2,3
or 4 pts for instance, but how you determine whether a talk gets 2 or 3
pts is a lot more random and improvised than people think, and it rarely
reflects on a true quality in the talk that got 3 pts over the talk that
got 2 pts.

In the end, what we are looking for isn't talks that score a certain
random number of points, but talks that will be attended. By doing a
comparison-style review, it is our firm belief we get closer to that
than by doing a one-on-one.

I'll grant you and the others that we definitely need more reviewers in
general, and especially if we are to get a more true sense of which
talks are most likely to be attended. But that's more of a problem with
us not getting the word out efficiently.

As for the 'daily batch' you had to go through, I'll admit that 720 or
whatever the number was, is a tad much. We can definitely lower that
number to make it more digestible.

The system is by no means perfect, and we're working on improving it as
experiences/responses reach us.

We'll be having some informal discussions at ApacheCon in Seville to
work on the review process, and I hope people will attend :)

With regards,
Daniel.

> 
> I share this as well: given the large amount of proposals, the decision making for all
permutations is
> simply too much. Also, due to the relatively small amount of reviewers and the “real”
randomisation, I
> think there’s a large bias in the final decisions: I’ve come across the same “match”
several times,
> which implies that the one talk that lost the battle has a negative bias.
> 
> I’ve tried to do as many “battles” as possible, but got only up to about 1000 before
I was fed up and
> no longer could spend time on it due to other obligations. I’m not sure if I’ve seen
all proposals
> (probably not), which is a pity, in my opinion...
> 
> --
> Met vriendelijke groeten | Kind regards
> 
> Jan Willem Janssen | Software Architect
> +31 631 765 814
> 
> 
> My world is something with Amdatu and Apache
> 
> Luminis Technologies
> Churchillplein 1
> 7314 BZ  Apeldoorn
> +31 88 586 46 00
> 
> https://www.luminis.eu
> 
> KvK (CoC) 09 16 28 93
> BTW (VAT) NL8170.94.441.B.01
> 


Mime
View raw message