mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darius Miliauskas <dariui.miliaus...@gmail.com>
Subject Re: Mahout readable output
Date Fri, 06 Sep 2013 08:25:23 GMT
Dear Vishal,

can you give some code how you performed your mentioned steps:

 #) Created custom VectorIterable by inheriting Iterable<Vector>.
 #) Created custom VectorItertor by inheriting AbstractIterator<Vector>
 #) Model class which will be responsible to pass attribute values
(username or data etc) to custom VectorIterator
 #) Custom VectorIterator.computeNext() will read line, create dense
vector having size equal to number of attribute in a row.

Can you compile the code?


Best,

Darius



2013/9/6 Vishal Danech <vishal.danech@gmail.com>

> Hi
>
> I have a custom log data which contains following details.
>
> 1) UserName
> 2) MachineId
> 3) DateTime
> 4) Data - which contains text - search term etc
>
> I would like to use this data to know
>      #) how much time they are spending on browsing etc.
>      #) User based search pattern
>
> First problem can be addressed using Hive query.
>
> For second problem, I suppose clustering can be applied and for this I have
> converted data to vectors. I have used dense vector and applied Canopy
> algorithm on it. I got an output which I provided as an input to
> ClusterDump utility but the out I got was not in readable form, I figured
> out that I need to use named vectors so that Key can be displayed as a
> output. Here I am facing issue, how to use NamedVector ?
>
> I am performing following steps to generate vectors..
>      #) Created custom VectorIterable by inheriting Iterable<Vector>.
>      #) Created custom VectorItertor by inheriting AbstractIterator<Vector>
>      #) Model class which will be responsible to pass attribute values
> (username or data etc) to custom VectorIterator
>      #) Custom VectorIterator.computeNext() will read line, create dense
> vector having size equal to number of attribute in a row.
>
> Please let me know how to add NamedVector here so that I can get some
> readable output from ClusterDump utility.
>
> --
> Thanks and Regards
> Vishal Danech
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message