incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-127) Allow multiple HBaseTargets in a single pipeline
Date Tue, 18 Dec 2012 22:40:12 GMT


Micah Whitacre updated CRUNCH-127:

    Attachment: CRUNCH-127_itest.patch

I wrote up an itest that I thought would demonstrate writing to two table successfully.  I
haven't gotten it to execute successfully. (I do have your patched applied locally)  The message
indicates that the Job is failing but I haven't dug into why just yet.

Is that how you anticipated the consumers using the multiple outputs?  Or did I do something

It'd be nice if we could actually hide the multi table support from consumers.  From a consumer
API perspective the way I would hope to use this would be to seemingly do independent writes
anywhere I want along the pipeline but implementation wise they would be aggregated to use
the HBaseMultiTableTarget if necessary.

If there was a method on the ToHBase class like:

  PCollection<Put> puts = ...;
  ToHBase.write(pipeline, "tableName", puts);

This would essentially hide the conversion to PTable<ImmutableBytesWritable, Put> which
seems like the same code everywhere.  The difficulty with the above is if ToHBase would have
to track internal state of the target and union of the collections.

Or if this could be hidden behind HBaseTarget itself that would be nice.  Just throwing out
ideas and will hopefully get some time to play with the implementation.

Also is the intention that a single pipeline would only ever use the HBaseMultiTableTarget
or HBaseTarget?  Or would it be acceptable to use the together?  
> Allow multiple HBaseTargets in a single pipeline
> ------------------------------------------------
>                 Key: CRUNCH-127
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Micah Whitacre
>            Assignee: Josh Wills
>         Attachments: CRUNCH-127_itest.patch, CRUNCH-127.patch
> Currently when a pipeline contains writes to multiple HBaseTargets, all puts are being
sent to the first configured HBaseTarget ignoring the second one and causing issues if the
columns are not the same.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message