mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: What about a universal input data handling mechanism for Mahout?
Date Tue, 26 Jul 2011 09:50:32 GMT
We do have:
SequenceFilesFromCsvFilter, although it is somewhat basic
CSVVectorIterator, which takes a CSV file and produces a dense vector

On Jul 26, 2011, at 3:58 AM, Ted Dunning wrote:

> The critical design step here is to decide how to express the schema of the
> CSV file.  There is a beginning of this in the CsvRecordFactory, but I was
> never happy with the (lack of) speed.
> On Tue, Jul 26, 2011 at 12:10 AM, Sebastian Schelter <> wrote:
>> 2. SequenceFile is not file format that command line users can
>>> prepare, is there tool for converting CSV files into SequenceFiles
>> I don't think we have that yet, but it would be very useful imho.

Grant Ingersoll

View raw message