accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie J Rinaldi <billie.j.rina...@ugov.gov>
Subject Re: Querying Accumulo From Inside Mapper
Date Tue, 17 Apr 2012 14:06:06 GMT
On Tuesday, April 17, 2012 8:49:45 AM, "David Medinets" <david.medinets@gmail.com> wrote:
> I am reading from a text file of linked IDs but I want to store the
> lookup values inside Accumulo.
> 
> RDB FOO
> ------
> FOO_ID <-- this is the autoincrement key
> ALT_ID <-- this is the natural key
> NAME
> AGE
> 
> RDB BAR
> ------
> BAR_ID <-- this is the autoincrement key
> TAG <-- zero or more person
> 
> RDB LINK
> ------
> FOO_ID
> BAR_ID
> 
> * RDB is relational database table.
> 
> Inside Accumulo, I want to use the ALT_ID as the row id because there
> is other data that uses it which will also be stored in the row. I
> will process the FOO text file first to result in:
> 
> FOO
> -------
> ALT_ID NAME XXX
> ALT_ID AGE XXX
> FOO_ID ALT_ID XXXX
> 
> Can I write to two Accumulo tables using one mapper? If I can, then I
> can store the FOO_ID/ALT_ID record in a separate table.

Yes.  The AccumuloOutputFormat is parameterized by <Text,Mutation> where the Text is
the table name.

> Processing the BAR text file provides:
> 
> BAR
> ------
> BAR_ID TAG XXXX
> 
> Then when I process the LINK table, I can query the FOO table to find
> the ALT_ID. And query the BAR table to find the tag. Then combine the
> information for the mutation:
> 
> FOO
> ------
> ALT_ID TAG XXX
> 
> Is there a best practice to query from inside a mapper?

Just make sure to do the Accumulo setup in the Mapper setup method.  You'll probably want
to look at the InputFormatBase to see how it passes the configuration information.

Billie


> At the end of the work, I can delete the ALT_ID column (or table).
> 
> I know that this work is trivial using SQL, but <sigh> that's not an
> option.

Mime
View raw message