flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Theodore Vasiloudis (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-2186) Rework SVM import to support very wide files
Date Mon, 08 Jun 2015 14:53:01 GMT

     [ https://issues.apache.org/jira/browse/FLINK-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Theodore Vasiloudis updated FLINK-2186:
---------------------------------------
    Description: 
In the current readVcsFile implementation, importing CSV files with many columns can become
from cumbersome to impossible.

For example to import an 11 column file we need to write:

{code}
val cancer = env.readCsvFile[(String, String, String, String, String, String, String, String,
String, String, String)]("/path/to/breast-cancer-wisconsin.data")
{code}

For many use cases in Machine Learning we might have CSV files with thousands or millions
of columns that we want to import as vectors.
In that case using the current readCsvFile method becomes impossible.

We therefore need to rework the current function, or create a new one that will allow us to
import CSV files with an arbitrary number of columns.

  was:
In the current readVcsFile implementation, importing CSV files with many columns can become
from cumbersome to impossible.

For example to import an 11 column file wee need to write:

{code}
val cancer = env.readCsvFile[(String, String, String, String, String, String, String, String,
String, String, String)]("/path/to/breast-cancer-wisconsin.data")
{code}

For many use cases in Machine Learning we might have CSV files with thousands or millions
of columns that we want to import as vectors.
In that case using the current readCsvFile method becomes impossible.

We therefore need to rework the current function, or create a new one that will allow us to
import CSV files with an arbitrary number of columns.


> Rework SVM import to support very wide files
> --------------------------------------------
>
>                 Key: FLINK-2186
>                 URL: https://issues.apache.org/jira/browse/FLINK-2186
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library, Scala API
>            Reporter: Theodore Vasiloudis
>
> In the current readVcsFile implementation, importing CSV files with many columns can
become from cumbersome to impossible.
> For example to import an 11 column file we need to write:
> {code}
> val cancer = env.readCsvFile[(String, String, String, String, String, String, String,
String, String, String, String)]("/path/to/breast-cancer-wisconsin.data")
> {code}
> For many use cases in Machine Learning we might have CSV files with thousands or millions
of columns that we want to import as vectors.
> In that case using the current readCsvFile method becomes impossible.
> We therefore need to rework the current function, or create a new one that will allow
us to import CSV files with an arbitrary number of columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message