hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Peterson <kpeter...@biz360.com>
Subject Re: DBOutputFormat and auto-generated keys
Date Tue, 27 Jan 2009 21:56:21 GMT
On Mon, Jan 26, 2009 at 5:40 PM, Vadim Zaliva <krokodil@gmail.com> wrote:

> Is it possible to obtain auto-generated IDs when writing data using
> DBOutputFormat?
> For example, is it possible to write Mapper which stores records in DB
> and returns auto-generated
> IDs of these records?


> which I would like to store in normalized for in two tables. First
> table will store
> keys (string). Each key will have unique int id auto-generated by mysql.
> Second table will have (key_id,value) pairs, key_id being foreign key,
> pointing to first table.

A mapper has to have one output format, and that output format can't pass
any data into the map, so that approach won't work. DBOutputFormat doesn't
provide any way to do it either.

If you wanted to add this kind of functionality, you would need to write
your own output format, which probably wouldn't look much like
DBOutputFormat, which would be aware of your foreign keys. It would quickly
get very complicated.

One possibility that comes to mind is writing a "HibernateOutputFormat" or
similar, which would give you a way to express the relationships between
tables, leaving your only task to hook up your persistence logic to a hadoop
output format.

I had a similar problem with writing out reports to be used by a Rails app,
and solved it by restructuring things so that I don't need to write to two
tables from the same map task.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message