mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From balaprasanna <>
Subject Regarding tf calculation for 2 millions files...
Date Wed, 25 Apr 2012 11:19:02 GMT
I am currently using Mahout for machine learning algorithms. I have a single
file which consist of 2 million lines of text. I want to run document-term
matrix for them. I have converted entire single file into a directory
consisting of 2 million files individually. Now I am running seq2sparse for
calculation of tf matrices. I am running this on hadoop cloudera for two
nodes. Since the file size is large, this take lot of time for calculation
of tf matrices for 2 million files. Is there is any alternative way such
that I can speed up this process. 

View this message in context:
Sent from the Mahout User List mailing list archive at
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message