flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stavros Kontopoulos <st.kontopou...@gmail.com>
Subject Re: Flink ML - NaN Handling
Date Sun, 12 Feb 2017 19:55:14 GMT
Btw I think we should add an Imputer if we follow scikit-learn as stated
here for preparing the dataset:
http://scikit-learn.org/stable/modules/preprocessing.html
"Imputation of Missing Values" paragraph. What do you think? Should I add
it as an issue on jira?

The question for NaN also holds for generated data from one pipeline stage
feed to the other. In all cases we should fire an exception from what I
see....
For example for sklearn:

>>> X = [[ 1., -1.,  2.],
... [ 2.,  0.,  float('NaN')]]

>>> preprocessing.normalize(X, norm='l2')
Traceback (most recent call last):
....
ValueError: Input contains NaN, infinity or a value too large for
dtype('float64').

I don't see that in FLink ML's code, my understanding  is that that NaNs
are propagated correct?
For example when I run the MinMaxScalerIT tests with NaN in the data I get
a result like:
DenseVector(0.34528405956977387, 0.5, NaN)
...
which is reasonable given the implementation but should be allowed?


On Sun, Feb 12, 2017 at 9:03 PM, Stavros Kontopoulos <
st.kontopoulos@gmail.com> wrote:

> Ok cool thnx Till.
>
> On Sun, Feb 12, 2017 at 4:59 PM, Till Rohrmann <trohrmann@apache.org>
> wrote:
>
>> Hi Stavros,
>>
>> so far we've sticked mainly to scikit-learn in terms of semantics. Thus, I
>> would recommend to follow scikit-learn's approach to handle NaNs.
>>
>> Cheers,
>> Till
>>
>> On Fri, Feb 10, 2017 at 11:48 PM, Stavros Kontopoulos <
>> st.kontopoulos@gmail.com> wrote:
>>
>> > Hello guys,
>> >
>> > Is there a story for this (might have been discussed earlier)? I see
>> > differences between scikit-learn and numpy. Do we standardize on
>> > scikit-learn?
>> >
>> > PS. I am working on the preprocessing stuff.
>> >
>> > Best,
>> > Stavros
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message