hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edi V <adityanarayan.vai...@gmail.com>
Subject Delete record pattern using MapReduce
Date Wed, 08 Mar 2017 19:51:45 GMT

I am new to HDFS and wanted to know if there exists a design pattern to
delete matching records from files stored in HDFS.
The following is my use case -

I have my files with json content stored in hdfs and each json record has 2
fields a and b. Now I need to delete the content/record
from the file which matches either a or b from a list of a,b values which
keeps getting updated every now and then.

So I have a list of <a1,b1>,<a2,b2>.......<an,bn>. At the end of the day
run a job which will delete all the records from the file whose
a or b values are matched.


View raw message