mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Franchuk <Alexander.Franc...@icims.com>
Subject classifier.sgd.CsvRecordFactory incorrect CSV parsing
Date Thu, 18 Jul 2013 17:23:53 GMT
Hi All,
I've been working with mahout for an internship this summer, and in the process I noticed
that the CsvRecordFactory class uses incorrect parsing of CSV files. So I made a fix for this,
which is in the attached patch file. It's not a huge change or anything, but I thought it
would be helpful for people. This will also fix the demo programs in the mahout distribution
from failing due to incorrect parsing of CSV files. For instance, if you have a double-quoted
field with a comma in it, the demo programs will incorrectly divide the field into two, which
in some cases causes parsing problems, and even if the program doesn't fail, it will of course
cause incorrect results.

This patch causes the class to use the solr-commons-csv.jar file, which I noticed was included
in the mahout distribution.

Hope this helps! And thanks for all your work, my experience with Mahout has been great so
far.
Alex Franchuk

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message