Use MultipleInputs and use two different mappers for the inputs. map1
should be IdentityMapper, mapper 2 should output key, value pairs where
value is a peudo marker value(same for all keys), which marks that the
value is null/empty. In the reducer just output the key/value pairs
which does not include the marker value in their values.
in your example suppose that we use 1 as a marker value, then in
mapper2, the output will be
4, 1
2, 1
and the reducer will get :
2, {1,3,5,1}
3, {1,2}
4, {7,9,1}
6, {3}
then reducer will output :
3, 1
3, 2
6, 3
Nir Zohar wrote:
> I would like your help with the below question.
> I have 2 files: file1 (key, value), file2 (only key) and I need to exclude
> all records from file1 that these key records not in file2.
> 1. The output format is keyvalue, not only keys.
>
> 2. The key is not primary key; hence it's not possible to have joined in the
> end.
>
>
>
> Can you assist?
> Example:
> file1:
> 2,1
> 2,3
> 2,5
> 3,1
> 3,2
> 4,7
> 4,9
> 6,3
> file2:
> 4
> 2
> Output:
> 3,1
> 3,2
> 6,3
