spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Documentation confusing or incorrect for decision trees?
Date Thu, 07 Aug 2014 06:19:10 GMT
It's definitely just a typo. The ordered categories are A, C, B so the
other split can't be A | B, C. Just open a PR.

On Thu, Aug 7, 2014 at 2:11 AM, Matt Forbes <matt@tellapart.com> wrote:
> I found the section on ordering categorical features really interesting,
> but the A, B, C example seemed inconsistent. Am I interpreting this passage
> wrong, or are there typos? Aren't the split candidates A | C, B and A, C |
> B ?
>
> For example, for a binary classification problem with one categorical
> feature with three categories A, B and C with corresponding proportion of
> label 1 as 0.2, 0.6 and 0.4, the categorical features are ordered as A
> followed by C followed B or A, B, C. The two split candidates are A | C, B
> and A , B | C where | denotes the split.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message