ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Zinoviev (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-7328) Improve Labeled Dataset loading from txt file
Date Thu, 28 Dec 2017 12:46:00 GMT
Aleksey Zinoviev created IGNITE-7328:
----------------------------------------

             Summary: Improve Labeled Dataset loading from txt file
                 Key: IGNITE-7328
                 URL: https://issues.apache.org/jira/browse/IGNITE-7328
             Project: Ignite
          Issue Type: New Feature
          Components: ml
            Reporter: Aleksey Zinoviev
            Assignee: Aleksey Zinoviev


1. Wouldn't it be better to parse rows in-place (not to save them as strings at first)? In
current implementation we will be needed to keep a dataset in memory twice and it might be
a problem for big datasets.

2. What about the case when a dataset contains not only a numerical data? Do we consider this
case or for such purposes some other "DatasetLoader" will be used?

3. Just an idea, in case we don't want to fall on bad data (99% of cases) would be great to
understand the quality of loaded dataset such as number of missed rows/values.

4. Does a situation when a row doesn't contain required number of columns should be considered
as "bad data" and don't break parsing with IndexOutOfBoundException?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message