ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anton Dmitriev (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (IGNITE-8666) Add ability of filtering data during datasets creation
Date Fri, 01 Jun 2018 09:31:00 GMT

     [ https://issues.apache.org/jira/browse/IGNITE-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Anton Dmitriev updated IGNITE-8666:
-----------------------------------
    Description: 
So far we use straightforward strategy to feed data into partition based dataset. We retrieve
all entries from an upstream cache partition, transform it somehow and write into correspondent
dataset partition (data and context). As result we can't choose the data to be fed into dataset
and data to be not fed. To implement IGNITE-8667 (Splitting of dataset to test and training
sets) and IGNITE-8668 (K-fold cross validation of models) we need to have such ability.

The goal of this task is to add an ability to filter data that fed from cache to dataset.
It will allow us to create different dataset (training, testing, k-fold, etc...) based on
a single cache.

  was:
So far we use straightforward strategy to feed data into partition based dataset. We retrieve
all entries from an upstream cache partition, transform it somehow and write into correspondent
dataset partition (data and context). As result we can't choose the data to be fed into dataset
and data to be not fed. To implement IGNITE-8667 (Splitting of dataset to test and training
sets) and IGNITE-8668 (K-fold cross validation of models) we need to have such ability.

The goal of this task is to add an ability to filter data that fed from cache to dataset.
It will allow us to create different dataset (training, testing, k-fold, etc...) based on
a single cache


> Add ability of filtering data during datasets creation
> ------------------------------------------------------
>
>                 Key: IGNITE-8666
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8666
>             Project: Ignite
>          Issue Type: New Feature
>          Components: ml
>            Reporter: Yury Babak
>            Assignee: Anton Dmitriev
>            Priority: Major
>             Fix For: 2.6
>
>
> So far we use straightforward strategy to feed data into partition based dataset. We
retrieve all entries from an upstream cache partition, transform it somehow and write into
correspondent dataset partition (data and context). As result we can't choose the data to
be fed into dataset and data to be not fed. To implement IGNITE-8667 (Splitting of dataset
to test and training sets) and IGNITE-8668 (K-fold cross validation of models) we need to
have such ability.
> The goal of this task is to add an ability to filter data that fed from cache to dataset.
It will allow us to create different dataset (training, testing, k-fold, etc...) based on
a single cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message