ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anton Dmitriev (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (IGNITE-11655) ML: OneHotEncoder returns more columns than expected
Date Fri, 29 Mar 2019 09:45:00 GMT

     [ https://issues.apache.org/jira/browse/IGNITE-11655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Anton Dmitriev updated IGNITE-11655:
------------------------------------
    Description: 
OneHotEncoder returns more columns than expected (two values that might be encoded using two
columns encoded using 3 columns). The following example demonstrates the problem:


{code:java}
Map<Integer, Object[]> training = new HashMap<>();
training.put(0, new Object[]{42.0});
training.put(1, new Object[]{43.0});
 training.put(2, new Object[]{42.0});

 EncoderTrainer<Integer, Object[]> trainer = new EncoderTrainer<Integer, Object[]>()
    .withEncoderType(EncoderType.ONE_HOT_ENCODER)
    .withEncodedFeature(0);

IgniteBiFunction<Integer, Object[], Vector> processor = trainer.fit(training, 1, (k,
v) -> v);
Vector res = processor.apply(1, new Object[]{42.0});
System.out.println(Arrays.toString(res.asArray()));

>>> [0.0, 1.0, 0.0]
{code}


  was:
OneHotEncoder returns more columns than expected (two values that might be encoded using two
columns encoded using 3 columns). The following example demonstrates the problem:

Map<Integer, Object[]> training = new HashMap<>();
        training.put(0, new Object[]{42.0});
        training.put(1, new Object[]{43.0});
        training.put(2, new Object[]{42.0});

        EncoderTrainer<Integer, Object[]> trainer = new EncoderTrainer<Integer, Object[]>()
            .withEncoderType(EncoderType.ONE_HOT_ENCODER)
            .withEncodedFeature(0);

        IgniteBiFunction<Integer, Object[], Vector> processor = trainer.fit(training,
1, (k, v) -> v);

        Vector res = processor.apply(1, new Object[]{42.0});

        System.out.println(Arrays.toString(res.asArray()));

>>> [0.0, 1.0, 0.0]


> ML: OneHotEncoder returns more columns than expected
> ----------------------------------------------------
>
>                 Key: IGNITE-11655
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11655
>             Project: Ignite
>          Issue Type: Bug
>          Components: ml
>    Affects Versions: 2.7
>            Reporter: Anton Dmitriev
>            Priority: Major
>
> OneHotEncoder returns more columns than expected (two values that might be encoded using
two columns encoded using 3 columns). The following example demonstrates the problem:
> {code:java}
> Map<Integer, Object[]> training = new HashMap<>();
> training.put(0, new Object[]{42.0});
> training.put(1, new Object[]{43.0});
>  training.put(2, new Object[]{42.0});
>  EncoderTrainer<Integer, Object[]> trainer = new EncoderTrainer<Integer, Object[]>()
>     .withEncoderType(EncoderType.ONE_HOT_ENCODER)
>     .withEncodedFeature(0);
> IgniteBiFunction<Integer, Object[], Vector> processor = trainer.fit(training, 1,
(k, v) -> v);
> Vector res = processor.apply(1, new Object[]{42.0});
> System.out.println(Arrays.toString(res.asArray()));
> >>> [0.0, 1.0, 0.0]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message