hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Natalia Connolly <natalia.v.conno...@gmail.com>
Subject MapReduce for complex key/value pairs?
Date Tue, 08 Apr 2014 15:46:42 GMT
Dear All,

    I was wondering if the following is possible using MapReduce.

    I would like to create a job that loops over a bunch of documents,
tokenizes them into ngrams, and stores the ngrams and not only the counts
of ngrams but also _which_ document(s) had this particular ngram.  In other
words, the key would be the ngram but the value would be an integer (the
count) _and_ an array of document id's.

    Is this something that can be done?  Any pointers would be appreciated.

    I am using Java, btw.

   Thank you,

   Natalia Connolly

View raw message