accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie Rinaldi <bil...@apache.org>
Subject Re: Using Accumulo as input to a MapReduce job frequently hangs due to lost Zookeeper connection
Date Thu, 11 Oct 2012 18:57:09 GMT
On Wed, Oct 10, 2012 at 7:22 AM, ameet kini <ameetkini@gmail.com> wrote:

> I have a related problem where I need to do a 1-1 join (every row in
> table A joins with a unique row in table B and vice versa). My join
> key is the row id of the table. In the past, I've used Hadoop's
> CompositeInputFormat to do a map-side join over data in HDFS
> (described here
> http://www.congiu.com/joins-in-hadoop-using-compositeinputformat/)  My
> tables in Accumulo seem to fit the eligibility criteria of
> CompositeInputFormat: both tables are sorted by the join key, since
> the join key is the row id in my case, and the tables are partitioned
> the same way (i.e., same split points).
>
> Has anyone tried using CompositeInputFormat over Accumulo tables? Is
> it possible to configure CompositeInputFormat with
> AccumuloInputFormat?
>

I haven't tried it.  If you do, let us know how it works out.

Billie


>
> Thanks,
> Ameet
>
>
> On Tue, Aug 21, 2012 at 8:23 AM, Keith Turner <keith@deenlo.com> wrote:
> > Yeah, that would certainly work.
> >
> > You could run two map only jobs (could run concurrently).  A job that
> > reads D1 and writes to Table3 and a job that reads D2 and writes
> > Table3.   Map reduce may be faster, unless you want the final result
> > in Accumulo in which case this may be faster.  The two map reduce jobs
> > could also produce files to bulk import into table3.
> >
> > Keith
> >
> > On Mon, Aug 20, 2012 at 8:26 PM, David Medinets
> > <david.medinets@gmail.com> wrote:
> >> Can you use a new table to join and then scan the new table? Use the
> foreign
> >> key as the rowid. Basically create your own materialized view.
>

Mime
View raw message