hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bharath Mundlapudi <bharathw...@yahoo.com>
Subject Re: Comparing two logs, finding missing records
Date Sun, 26 Jun 2011 22:12:35 GMT
If you have Serde or PigLoader for your log format, probably Pig or Hive will be a quicker
solution with the join.


From: Mark Kerzner <markkerzner@gmail.com>
To: Hadoop Discussion Group <core-user@hadoop.apache.org>
Sent: Saturday, June 25, 2011 9:39 PM
Subject: Comparing two logs, finding missing records


I have two logs which should have all the records for the same record_id, in
other words, if this record_id is found in the first log, it should also be
found in the second one. However, I suspect that the second log is filtered
out, and I need to find the missing records. Anything is allowed: MapReduce
job, Hive, Pig, and even a NoSQL database.

Thank you.

It is also a good time to express my thanks to all the members of the group
who are always very helpful.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message