mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: Twitter Classification
Date Thu, 21 Jan 2010 00:24:34 GMT
Right I think this answers the previous questions? There are a
couple of main APIs a workbench could tie into. One is the
streaming API, the other is the older Search API:
http://apiwiki.twitter.com/Twitter-Search-API-Method%3A-search

Ted mentioned simply playing with the data visually is
the best way to start.  Perhaps we can build some helper tools?

As far as classification, it seems like search via Twitter is
going to evolve into somewhat uselessness quickly, and so value
added search, or perhaps personalized search via classification
could be more handy. I could see where various vertical web site
classify Tweets into categories based on their own custom
trained models. So rather than a one size fits all model, I'm
thinking some easy open source tools (like Mahout) will allow
anyone to build many different models to assist in organizing a stream of
Tweets. What happens after that is part of the fun!

> 1. access to the data, although I'm sure the ASF could work
something out here

I think we're providing software here, I can't see downloading
the data in ASF repositories. Mahout being on Hadoop is great
for archived Tweets, and then some realtime algorithms could be
useful for the streaming data.

> 2. training data. wouldn't you need a set of 'tweets'
classified in some manner? or were you thinking of using a
different data source to base it on?

It'd be nice to develop a workbench to easily build the training
set. Then allow easy retraining, which should occur quite often
with Twitter.

> Do you have any deeper thinkings about this topic?

We can try things out... I think Twitter offers some unique
challenges to machine learning, Ted do you agree?


On Wed, Jan 20, 2010 at 1:10 PM, Hannes Carl Meyer
<hannescarl@googlemail.com> wrote:
> Hi Jason,
> to get access to the Twitter Data you could use the Twitter Streaming API:
> http://apiwiki.twitter.com/Streaming-API-Documentation
> Regards
> Hannes
>
> On Wed, Jan 20, 2010 at 10:02 PM, Ian Holsman <lists@holsman.net> wrote:
>
>> On 1/20/10 2:35 AM, Jason Rutherglen wrote:
>>
>>> We've got Newsgroup classification. I'm kinda of interested in
>>> creating a Twitter classification system, or at least playing
>>> around with it. Also I think as a relevant growing large data
>>> set, it seems Twitter fit well with Hadoop based machine
>>> learning algorithms... Just throwing out into the wild!
>>>
>>>
>>>
>> Hi Jason.
>> I think the biggest issues here are twofold.
>>
>> 1. access to the data, although I'm sure the ASF could work something out
>> here
>> 2. training data. wouldn't you need a set of 'tweets' classified in some
>> manner? or were you thinking of using a different data source to base it on?
>>
>

Mime
View raw message