hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: MapReduce for complex key/value pairs?
Date Tue, 08 Apr 2014 18:30:57 GMT
Yes, you can write custom writable classes that detail and serialise
your required data structure. If you have Hadoop: The Definitive
Guide, checkout its section "Serialization" under chapter "Hadoop
I/O".

On Tue, Apr 8, 2014 at 9:16 PM, Natalia Connolly
<natalia.v.connolly@gmail.com> wrote:
> Dear All,
>
>     I was wondering if the following is possible using MapReduce.
>
>     I would like to create a job that loops over a bunch of documents,
> tokenizes them into ngrams, and stores the ngrams and not only the counts of
> ngrams but also _which_ document(s) had this particular ngram.  In other
> words, the key would be the ngram but the value would be an integer (the
> count) _and_ an array of document id's.
>
>     Is this something that can be done?  Any pointers would be appreciated.
>
>     I am using Java, btw.
>
>    Thank you,
>
>    Natalia Connolly
>



-- 
Harsh J

Mime
View raw message