hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Latham <lat...@davelink.net>
Subject Re: intersection of row ids
Date Fri, 11 Mar 2011 17:23:56 GMT
If the ordering of the row ids is the same in both tables and both are of
the same order of magnitude of size, I would recommend opening scanners on
both tables, then compare the current row in each scanner, and advance
whichever scanner is behind.  Whenever you hit a match, you output it and
advance both scanners.

If you need to do it faster, you can move the same approach into a MR job,
where you use TableInputFormat for one scanner, and open the other one
manually each Mapper.

If one table is order of magnitudes smaller than the other, or the rows ids
are formatted differently and not ordered the same in each table, then scan
the smaller table and issue gets to check for each row in the larger table.

Dave

On Thu, Mar 10, 2011 at 8:08 PM, Vishal Kapoor
<vishal.kapoor.in@gmail.com>wrote:

> Friends,
> how do I best achieve intersection of sets of row ids
> suppose I have two tables with similar row ids
> how can I get the row ids present in one and not in the other?
> does things get better if I have row ids as values in some qualifier/
> qualifier itself?
> I hope the question is not too confusing...
>
> intersection of {1, 2, 3} and {2, 3, 4} is {2, 3}.
> while {1,2,3} are row ids from a table, {2,3,4} may come from other table
> as
> qualifiers in some row.
>
> thanks,
> Vishal
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message