Yes I think that footnote could be a lot more prominent, or pulled up
right under the table.
I also think it would be fine to present the {0,1} formulation. It's
actually more recognizable, I think, for logloss in that form. It's
probably less recognizable for hinge loss, but, consistency is more
important. There's just an extra (2y1) term, at worst.
The loss here is per instance, and implicitly summed over all
instances. I think that is probably not confusing for the reader; if
they're reading this at all to doublecheck just what formulation is
being used, I think they'd know that. But, it's worth a note.
The loss is summed in the case of logloss, not multiplied (if that's
what you're saying).
Those are decent improvements, feel free to open a pull request / JIRA.
On Mon, Sep 26, 2016 at 6:22 AM, Tobi Bosede <ani.tobib@gmail.com> wrote:
> The loss function here for logistic regression is confusing. It seems to
> imply that spark uses only 1 and 1 class labels. However it uses 0,1 as the
> very inconspicuous note quoted below (under Classification) says. We need to
> make this point more visible to avoid confusion.
>
> Better yet, we should replace the loss function listed with that for 0, 1 no
> matter how mathematically inconvenient, since that is what is actually
> implemented in Spark.
>
> More problematic, the loss function (even in this "convenient" form) is
> actually incorrect. This is because it is missing either a summation (sigma)
> in the log or product (pi) outside the log, as the loss for logistic is the
> log likelihood. So there are multiple problems with the documentation.
> Please advise on steps to fix for all version documentation or if there are
> already some in place.
>
> "Note that, in the mathematical formulation in this guide, a binary label
> y is denoted as either +1 (positive) or −1 (negative), which is convenient
> for the formulation. However, the negative label is represented by 0 in
> spark.mllib instead of −1, to be consistent with multiclass labeling."

To unsubscribe email: userunsubscribe@spark.apache.org
