phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samarth Jain (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-1779) Parallelize fetching of next batch of records for scans corresponding to queries with no order by
Date Fri, 03 Apr 2015 20:46:52 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Samarth Jain updated PHOENIX-1779:
----------------------------------
    Attachment: wipwithsplits.patch

Parking the updated patch that handles split failures. All existing tests pass with the force_row_key_order
config set to true.

> Parallelize fetching of next batch of records for scans corresponding to queries with
no order by 
> --------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1779
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1779
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Samarth Jain
>            Assignee: Samarth Jain
>         Attachments: wip.patch, wipwithsplits.patch
>
>
> Today in Phoenix we parallelize the first execution of scans i.e. we load only the first
batch of records up to the scan's cache size in parallel. Loading of subsequent batches of
records in scanners is essentially serial. This could be improved especially for queries,
including the ones with no order by clauses,  that do not need any kind of merge sort on the
client. This could also potentially improve the performance of UPSERT SELECT statements that
load data from one table and insert into another. One such use case being creating immutable
indexes for tables that already have data. It could also potentially improve the performance
of our MapReduce solution for bulk loading data by improving the speed of the loading/mapping
phase. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message