mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nirmal Kumar <>
Subject Generating vectors from a single txt file using Java:KMeans clustering
Date Tue, 04 Jun 2013 09:22:52 GMT

I am having twitter data in a single txt file as:
@VancityBeerGuy - RT @BCBerrie: well @VancityBeerGuy you know what they say about guys with
#smallenfreuden right? Hahaha Created At:Mon Jun 03 07:18:46 IST 2013
@IanSylves - RT @PTorgo91: @otterN9NE you're the best thing to happen to the #sabres since
Drury #lordstanley#nextyear #smallenfreuden Created
 At:Mon Jun 03 07:18:37 IST 2013
@LiLItalyPasta - RT @LamyaAsiff: #smallenfreuden is #stupidfreuden. Created At:Mon Jun 03
07:17:36 IST 2013
@MMBris - RT @jaimestein: Whenever you find yourself on the side of the majority, it is time
to pause and #smallenfreuden. -Mark Twain Created At:Mon Jun 03 07:16:43 IST 2013
@SeanBickerton - RT @kbieksa3: Big save by Bernier to keep it somewhat close. Leave it to
a french guy to get the boys going... @aburr14 #Smallenfreuden Created At:Mon Jun 03 07:16:41
IST 2013

I need to generate vectors for KMeans clustering from this txt file using java.

I need help to select the features.

Lines from Mahout in Action:

The process of selecting the features of an object and mapping them to numbers is
known as feature selection. The process of encoding features as a vector is vectorization.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message