mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Who owns mahout bucket on s3?
Date Sat, 27 Feb 2010 22:13:33 GMT
15GB of tokenized documents, not bad, not bad.  We're not going
to get a multi-billion entry matrix out of this though, are we?

  -jake

On Sat, Feb 27, 2010 at 2:06 PM, Robin Anil <robin.anil@gmail.com> wrote:

> Update:
>
> in 20 mins the tokenization stage is complete But its not evident in the
> online UI.
> I found it by checking the s3 output folder
>
> 2010-02-27 21:50  2696826329
> s3://robinanil/wikipedia/tokenized-documents/part-00000
> 2010-02-27 21:52  2385184391
> s3://robinanil/wikipedia/tokenized-documents/part-00001
> 2010-02-27 21:52  2458566158
> s3://robinanil/wikipedia/tokenized-documents/part-00002
> 2010-02-27 21:53  2500213973
> s3://robinanil/wikipedia/tokenized-documents/part-00003
> 2010-02-27 21:50  2533593862
> s3://robinanil/wikipedia/tokenized-documents/part-00004
> 2010-02-27 21:54  3580695441
> s3://robinanil/wikipedia/tokenized-documents/part-00005
> 2010-02-27 22:02         0
> s3://robinanil/wikipedia/tokenized-documents_$folder$
> 2010-02-27 22:02         0
> s3://robinanil/wikipedia/wordcount/subgrams/_temporary_$folder$
> 2010-02-27 22:02         0
> s3://robinanil/wikipedia/wordcount/subgrams_$folder$
> 2010-02-27 22:02         0   s3://robinanil/wikipedia/wordcount_$folder$
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message