I've been running through the examples as described in the Mahout In
Action book and I have some questions regarding the
SequenceFilesFromDirectory.java class.
This class expects a directory of files that contains 1 document per
file. Is there another mahout class or some options I can supply to
SequenceFilesFromDirectory.java to parse multiple documents per file?
For example, my files contain 1 document per line. I would like to parse
each line of each file and create a sequence file from this. Is this
possible with SequenceFilesFromDirectory or would I have to write this
myself?
Thanks
|