spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shilad Sen <s...@macalester.edu>
Subject Word2Vec with billion-word corpora
Date Wed, 13 May 2015 18:17:43 GMT
Hi all,

I'm experimenting with Spark's Word2Vec implementation for a relatively
large (5B word, vocabulary size 4M, 400-dimensional vectors) corpora. Has
anybody had success running it at this scale?

Thanks in advance for your guidance!

-Shilad

-- 
Shilad W. Sen
Associate Professor
Mathematics, Statistics, and Computer Science Dept.
Macalester College
ssen@macalester.edu
http://www.shilad.com
https://www.linkedin.com/in/shilad
651-696-6273

Mime
View raw message