mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pat Ferrel (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAHOUT-1568) Build an I/O model that can replace sequence files for import/export
Date Sun, 01 Jun 2014 17:27:01 GMT
Pat Ferrel created MAHOUT-1568:
----------------------------------

             Summary: Build an I/O model that can replace sequence files for import/export
                 Key: MAHOUT-1568
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1568
             Project: Mahout
          Issue Type: New Feature
          Components: CLI
         Environment: Scala, Spark
            Reporter: Pat Ferrel
            Assignee: Pat Ferrel


Implement mechanisms to read and write data from/to flexible stores. These will support tuples
streams and drms but with extensions that allow keeping user defined values for IDs. The mechanism
in some sense can replace Sequence Files for import/export and will make the operation much
easier for the user. In many cases directly consuming their input files.

Start with text delimited files for input/output in the Spark version of ItemSimilarity

A proposal is running with ItemSimilarity on Spark which and is documented on the github wiki
here: https://github.com/pferrel/harness/wiki

Comments are appreciated



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message