incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Revelle <mreve...@gmail.com>
Subject Re: real-world dataset from social network?
Date Thu, 20 May 2010 17:27:31 GMT
It's unclear if you're looking for data that can be stored in Cassandra or an example of someone
using Cassandra to store a network; I'm assuming the former.

You will have a hard time finding a social network dataset with relationships already well-defined
for free.  I have seen crawls of Twitter before, but IIRC they go for thousands (in USD).

Try http://infochimps.org.

There's the Enron email dataset: http://www.cs.cmu.edu/~enron/

The reddit dataset is nice, maybe think beyond explicit connections and use voting commonality
as links between users?  That dataset seems to meet your requirement
of being sufficient to reconstruct a network of users.  You could have "friend" edges that
are based on voting agreement and "shared interest" edges based on
voting on stories from the same subreddits.

On May 20, 2010, at 1:09 PM, Valerio Schiavoni wrote:

> Not strictly Facebook. 
> Any online social network is ok to me, as long as it has a reasonable number of users
and that it's built on top of a schema-less storage system.
> 
> 
> Are you looking for Facebook stuff? Good luck on getting a data set from any real world
model.
>  
> 
> Hello everyone,
> i'm a phd student looking for some real-world dataset of any social networks built on
top of some schema-less storage system. 
> The dataset should at least provide a mean to reconstruct the graph of users.
> Due to possible sensible informations in the dataset, the dataset can be very possibly
anonymized if required, it's not important for my research.
> 
> Someone on #cassandra provided some dataset of reddit votes : http://www.reddit.com/r/redditdev/comments/bubhl/csv_dump_of_reddit_voting_data/.
> This dataset is interesting, but it doesn't provide informations about the graph of users.
> 


Mime
View raw message