mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Classification on Techcrunch
Date Tue, 26 Jul 2011 21:20:57 GMT
Yep.

That sounds like a fine approach.

You should try several algorithms, but the basic text classification
approach should work reasonably well, especially if you include phrases and
are aggressive about getting rid of garbage text.

On Tue, Jul 26, 2011 at 2:17 PM, Shrikar archak <shrikar84@gmail.com> wrote:

> Hi All,
> I am new to Machine learning and wanted to know more about Mahout in
> general and how we can apply these algortithms to our applications.
>
> I wanted to try out this example:
>
> Techcrunch has the company database and also information about what that
> company does.
> I was thinking if we can use Mahout's Classifying algorithms which could
> take these info
> pages and classify them companies into different categories..
>
> One more thing would be to look at their job description and find out what
> technologies they are
> using and classify them.
>
> What would be the steps required to get this done..
> I tried out Twenty
> Newsgroups<
> https://cwiki.apache.org/confluence/display/MAHOUT/Twenty+Newsgroups
> >example
> in which case we need to train it.  I assume we need to
> do something like that for the problem described above.
> Please let me know.
>
> Thanks,
> Shrikar
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message