hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From parnab kumar <parnab.2...@gmail.com>
Subject How to design the mapper and reducer for the following problem
Date Fri, 14 Jun 2013 14:06:03 GMT
An input file where each line corresponds to a document .Each document is
identfied by some fingerPrints .For example a line in the input file
is of the following form :

input:
---------------------
DOCID1   HASH1 HASH2 HASH3 HASH4
DOCID2   HASH5 HASH3 HASH1 HASH4

The output of the mapreduce job should write the pair of DOCIDS which share
a threshold number of HASH in common.

output:
--------------------------
DOCID1 DOCID2
DOCID3 DOCID5

Mime
View raw message