spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shilad Sen <>
Subject Word2Vec with billion-word corpora
Date Wed, 13 May 2015 18:17:43 GMT
Hi all,

I'm experimenting with Spark's Word2Vec implementation for a relatively
large (5B word, vocabulary size 4M, 400-dimensional vectors) corpora. Has
anybody had success running it at this scale?

Thanks in advance for your guidance!


Shilad W. Sen
Associate Professor
Mathematics, Statistics, and Computer Science Dept.
Macalester College

View raw message