phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geoffrey Jacoby (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-4344) MapReduce Delete Support
Date Fri, 03 Nov 2017 01:00:08 GMT


Geoffrey Jacoby commented on PHOENIX-4344:

I don't see how Option 1 is problematic for indexes on non-PK columns, because it's internally
using the Phoenix JDBC API and so going through all the same index-handling logic that a point-delete
query issued from outside MapReduce would be doing. 

Let's say that I have a table ENTITY_HISTORY with a compound primary key (Key1, Key2). 

I create my MapReduce job with a query like "DELETE FROM ENTITY_HISTORY WHERE Key1 > 'aaa'"

That delete would be converted to a select, and the MapReduce job would iterate row by row
over the result set. For each row, a new Delete query would be built using that row's PK,
e.g "DELETE FROM ENTITY_HISTORY WHERE Key1 = 'foo' and Key2 = 'bar'" and executed using a
PhoenixConnection (probably with some kind of commit batching).

I'm somewhat concerned about the perf, but the correctness seems sound to me -- am I missing
an issue? 

> MapReduce Delete Support
> ------------------------
>                 Key: PHOENIX-4344
>                 URL:
>             Project: Phoenix
>          Issue Type: New Feature
>    Affects Versions: 4.12.0
>            Reporter: Geoffrey Jacoby
>            Assignee: Geoffrey Jacoby
>            Priority: Major
> Phoenix already has the ability to use MapReduce for asynchronous handling of long-running
SELECTs. It would be really useful to have this capability for long-running DELETEs, particularly
of tables with indexes where using HBase's own MapReduce integration would be prohibitively

This message was sent by Atlassian JIRA

View raw message