mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Vectorization, dictionary size, OpenObjectIntHashMap and OOM
Date Wed, 07 Nov 2012 17:06:45 GMT
It's in throwing it in the config of the Reducer, so not likely the vector, but it could be.

Once we went back to unigrams, the OOM in that spot went away.

On Nov 7, 2012, at 12:00 PM, Robin Anil wrote:

> Not seen the code in a while but AFAIR the reducer is not loading any
> dictionary. We chunk the dictionary to create partial vector. I think you
> just have a huge vector
> On Nov 7, 2012 10:50 AM, "Sean Owen" <> wrote:
>> It's a trie? Yeah that could be a big win. It gets tricky with Unicode, but
>> imagine there is a lot of gain even so.
>> "Bigrams over 11M terms" jumped out too as a place to start.
>> (I don't see any particular backwards compatibility issue with Lucene 3 to
>> even worry about.)

Grant Ingersoll

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message