hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bharath Mundlapudi <bharathw...@yahoo.com>
Subject Re: Comparing two logs, finding missing records
Date Mon, 27 Jun 2011 01:04:28 GMT
SQL:

SELECT * FROM LOG1 LEFT OUTER JOIN LOG2 ON LOG1.recordid = LOG2.recordid;


PIG:
data = JOIN LOG1 BY recordid LEFT OUTER, LOG2 BY recordid;
DUMP data;


If you need more PIG help, please post in PIG email alias.

-Bharath


________________________________
From: Mark Kerzner <markkerzner@gmail.com>
To: common-user@hadoop.apache.org; Bharath Mundlapudi <bharathwork@yahoo.com>
Sent: Sunday, June 26, 2011 5:50 PM
Subject: Re: Comparing two logs, finding missing records


Bharath,

how would a Pig query look like?

Thank you,
Mark


On Sun, Jun 26, 2011 at 5:12 PM, Bharath Mundlapudi <bharathwork@yahoo.com> wrote:

If you have Serde or PigLoader for your log format, probably Pig or Hive will be a quicker
solution with the join.
>
>-Bharath
>
>
>
>________________________________
>From: Mark Kerzner <markkerzner@gmail.com>
>To: Hadoop Discussion Group <core-user@hadoop.apache.org>
>Sent: Saturday, June 25, 2011 9:39 PM
>Subject: Comparing two logs, finding missing records
>
>
>Hi,
>
>I have two logs which should have all the records for the same record_id, in
>other words, if this record_id is found in the first log, it should also be
>found in the second one. However, I suspect that the second log is filtered
>out, and I need to find the missing records. Anything is allowed: MapReduce
>job, Hive, Pig, and even a NoSQL database.
>
>Thank you.
>
>It is also a good time to express my thanks to all the members of the group
>who are always very helpful.
>
>Sincerely,
>Mark
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message