phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geoffrey Jacoby (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-4344) MapReduce Delete Support
Date Thu, 02 Nov 2017 22:53:00 GMT


Geoffrey Jacoby commented on PHOENIX-4344:

Some thoughts, [~jamestaylor]

I want this to be usable for generic DELETE queries without the need for hand-written DBWritable

MapReduce goes line by line, rather than by Mapper Task/Scan, so while the client would be
issuing a broad DELETE query, the mapper itself would either be:

1. Issuing point DELETE Phoenix queries by the complete primary key derived from a SELECT
the MapReduce is iterating over 
(Mapper<NullWritable, DBWritable, NullWritable, NullWritable>)
2. Issuing DELETE mutations down to several HTables via MultiHFileOutputFormat from a DELETE
the MapReduce is iterating over
(Mapper<NullWritable, DBWritable, ImmutableBytesWritable, Delete>)

FormatToBytesWritableMapper relies heavily on a LineParser interface, and the only choices
appear to be CsvLineParser, JsonLineParser, and RegexLineParser. That means that in either
case the complete row key would have to be built by a new ResultSetLineParser that can take
in a ResultSet and parse it into an intermediate form suitable making either DELETE DML (Option
1) or Delete Mutations (Option 2). The former would just need to grab the row key components,
while the latter would potentially need everything, because an index can be on any column.

Also either way, we need a concrete generalized subclass of the abstract DBWritable. 

Option 1 seems considerably simpler/higher level, while Option 2 seems more efficient

> MapReduce Delete Support
> ------------------------
>                 Key: PHOENIX-4344
>                 URL:
>             Project: Phoenix
>          Issue Type: New Feature
>    Affects Versions: 4.12.0
>            Reporter: Geoffrey Jacoby
>            Assignee: Geoffrey Jacoby
>            Priority: Major
> Phoenix already has the ability to use MapReduce for asynchronous handling of long-running
SELECTs. It would be really useful to have this capability for long-running DELETEs, particularly
of tables with indexes where using HBase's own MapReduce integration would be prohibitively

This message was sent by Atlassian JIRA

View raw message