hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tim robertson <timrobertson...@gmail.com>
Subject Re: hadoop MapReduce and stop words
Date Sat, 16 May 2009 12:55:08 GMT
Perhaps some kind of in memory index would be better than iterating an
array?  Binary tree or so.
I did similar with polygon indexes and point data.  It requires
careful memory planning on the nodes if the indexes are large (mine
were several GB).

Just a thought,


On Sat, May 16, 2009 at 1:56 PM, PORTO aLET <portoalet@gmail.com> wrote:
> Hi,
> I am trying to include the stop words into hadoop map reduce, and later on,
> into hive.
> What is the accepted solution regarding the stop words in hadoop?
> All I can think is to load all the stop words into an array in the mapper,
> and then check each token against the stop words..(this would be O(n^2) )
> Regards

View raw message