phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-2417) Compress memory used by row key byte[] of guideposts
Date Fri, 15 Jan 2016 17:30:39 GMT


ASF GitHub Bot commented on PHOENIX-2417:

Github user JamesRTaylor commented on a diff in the pull request:
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/compile/ ---
    @@ -237,7 +237,9 @@ public void initializeScan(Scan scan) {
             return temp;
    -    public Scan intersectScan(Scan scan, final byte[] originalStartKey, final byte[]
originalStopKey, final int keyOffset, boolean crossesRegionBoundary) {
    +    public Scan intersectScan(Scan scan, final ImmutableBytesWritable originalStartKeyPtr,
final ImmutableBytesWritable originalStopKeyPtr, final int keyOffset, boolean crossesRegionBoundary)
    +        byte[] originalStartKey= originalStartKeyPtr.get();
    +        byte[] originalStopKey= originalStopKeyPtr.get();
    --- End diff --
    You can't treat an ImmutableBytesWritable the same as what was a byte[] before because
an ImmutableBytesWritable has an offset and a length. By doing it this way, you're assuming
that the offset is 0 and the length is byte[].length. Instead, you'd want to change the type
and adjust the code as necessary:
        ImmutableBytesWritable originalStartKey= originalStartKeyPtr;
        ImmutableBytesWritable originalStopKey = originalStopKeyPtr;

> Compress memory used by row key byte[] of guideposts
> ----------------------------------------------------
>                 Key: PHOENIX-2417
>                 URL:
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Ankit Singhal
>             Fix For: 4.7.0
>         Attachments: PHOENIX-2417.patch, PHOENIX-2417_encoder.diff, PHOENIX-2417_v2_wip.patch
> We've found that smaller guideposts are better in terms of minimizing any increase in
latency for point scans. However, this increases the amount of memory significantly when caching
the guideposts on the client. Guidepost are equidistant row keys in the form of raw byte[]
which are likely to have a large percentage of their leading bytes in common (as they're stored
in sorted order. We should use a simple compression technique to mitigate this. I noticed
that Apache Parquet has a run length encoding - perhaps we can use that.

This message was sent by Atlassian JIRA

View raw message