phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PHOENIX-1619) Read-only/mapped views directly on HBase tables do not maintain secondary indexes
Date Sun, 01 Feb 2015 21:43:34 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300747#comment-14300747
] 

James Taylor edited comment on PHOENIX-1619 at 2/1/15 9:43 PM:
---------------------------------------------------------------

There are a few options, ordered by the easiest first:

1. Write out the time series data using Phoenix APIs into a Phoenix table. There's very little
overhead in using these APIs, so perf should be the same. In this case, you could define the
indexes up front and they'd be automatically maintained. Sounds like for your use case you'd
want to use immutable indexes (i.e. set IMMUTABLE_ROWS=true on your table).
2. Write out the time series using HBase APIs, but write them in a Phoenix-compliant manner
to a Phoenix table (i.e. create a Phoenix TABLE instead of a VIEW in this case). Since you're
able to create a
view on top of the table, seems like you're almost there. You'd just need to write out an
"empty" KeyValue to each row. This would involve adding a 0:_0 column qualifier with an empty
byte array value.
    - Either use the new facility to create immutable LOCAL INDEX(es) over your TABLE (4.2+
only feature). In this case, you need to add some attributes to your mutations when you write
them. See MutationState.commit(), and in particular you'd need to set these two attributes:
PhoenixIndexCodec.INDEX_UUID, PhoenixIndexCodec.INDEX_MD. The first one groups Puts/Deletes
in a batch together, while the second one describes the structure of the index. If these attributes
are set correctly, the coprocessors on the back end will maintain your index for you even
if you're using HBase APIs (Puts & Deletes only).
    - Or if you're using immutable indexes, then you could define the index up front on the
TABLE. In this case, you'd need to generate the index table mutations yourself when you write
the data. It's possible you could hook into or mimic the code in MutationState.addRowMutations().
This solution would be somewhat brittle, as that code has changed and is likely to change
more.

3. If you don't want to touch your writing code, you could do 2b yourself, independent of
writing the table data. In this case, you'd want to track the timestamp for the last data
table rows you've written and do an HBase scan for rows newer than this timestamp. You could
probably leverage the existing Phoenix code as described in 2b.



was (Author: jamestaylor):
There are a few options, ordered by the easiest first:

1. Write out the time series data using Phoenix APIs into a Phoenix table. There's very little
overhead in using these APIs, so perf should be the same. In this case, you could define the
indexes up front and they'd be automatically maintained. Sounds like for your use case you'd
want to use immutable indexes (i.e. set IMMUTABLE_ROWS=true on your table).
2. Write out the time series using HBase APIs, but write them in a Phoenix-compliant manner
to a Phoenix table (i.e. create a Phoenix TABLE instead of a VIEW in this case). Since you're
able to create a
view on top of the table, seems like you're almost there. You'd just need to write out an
"empty" KeyValue to each row. This would involve adding a 0:_0 column qualifier with an empty
byte array value.
    - Either use the new facility to create immutable LOCAL INDEX(es) over your TABLE (4.2+
only feature). In this case, you need to add some attributes to your mutations when you write
them. See MutationState.commit(), and in particular you'd need to set these two attributes:
PhoenixIndexCodec.INDEX_UUID, PhoenixIndexCodec.INDEX_MD. The first one groups Puts/Deletes
in a batch together, while the second one describes the structure of the index. If these attributes
are set correctly, the coprocessors on the back end will maintain your index for you even
if you're using HBase APIs (Puts & Deletes only).
    - Or if you're using immutable indexes, then you could define the index up front on the
TABLE. In this case, you'd need to generate the index table mutations yourself when you write
the data. It's possible you could hook into or mimic the code in MutationState.addRowMutations().
This solution would be somewhat brittle, as that code has changed and is likely to change
more.
3. If you don't want to touch your writing code, you could do 2b yourself, independent of
writing the table data. In this case, you'd want to track the timestamp for the last data
table rows you've written and do an HBase scan for rows newer than this timestamp. You could
probably leverage the existing Phoenix code as described in 2b.


> Read-only/mapped views directly on HBase tables do not maintain secondary indexes
> ---------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1619
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1619
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: James Taylor
>
> A read-only/mapped view does not maintain its secondary indexes. This is by design currently,
as the Phoenix APIs are being bypassed so there's not much Phoenix can do. However, it would
be possible for a client to push the same metadata that Phoenix does through the HBase API
for this to occur. Phoenix may be able to provide some APIs to make this easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message