drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5429) Improve query performance for MapR DB JSON Tables
Date Tue, 02 May 2017 21:35:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993815#comment-15993815
] 

ASF GitHub Bot commented on DRILL-5429:
---------------------------------------

Github user gparai commented on a diff in the pull request:

    https://github.com/apache/drill/pull/817#discussion_r114430230
  
    --- Diff: contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableGroupScan.java
---
    @@ -100,30 +104,46 @@ public GroupScan clone(List<SchemaPath> columns) {
         return newScan;
       }
     
    +  public JsonTableGroupScan clone(JsonScanSpec scanSpec) {
    +    JsonTableGroupScan newScan = new JsonTableGroupScan(this);
    +    newScan.scanSpec = scanSpec;
    +    newScan.computeRegionsToScan();
    +    return newScan;
    +  }
    +
    +  private void computeRegionsToScan() {
    +    boolean foundStartRegion = false;
    +
    +    regionsToScan = new TreeMap<TabletFragmentInfo, String>();
    +    for (TabletInfo tabletInfo : tabletInfos) {
    +      TabletInfoImpl tabletInfoImpl = (TabletInfoImpl) tabletInfo;
    +      if (!foundStartRegion && !isNullOrEmpty(scanSpec.getStartRow()) &&
!tabletInfoImpl.containsRow(scanSpec.getStartRow())) {
    +        continue;
    +      }
    +      foundStartRegion = true;
    +      regionsToScan.put(new TabletFragmentInfo(tabletInfoImpl), tabletInfo.getLocations()[0]);
    +      if (!isNullOrEmpty(scanSpec.getStopRow()) && tabletInfoImpl.containsRow(scanSpec.getStopRow()))
{
    +        break;
    +      }
    +    }
    +  }
    +
       private void init() {
         logger.debug("Getting tablet locations");
         try {
           Configuration conf = new Configuration();
    -      Table t = MapRDB.getTable(scanSpec.getTableName());
    -      TabletInfo[] tabletInfos = t.getTabletInfos(scanSpec.getCondition());
    -      tableStats = new MapRDBTableStats(conf, scanSpec.getTableName());
     
    -      boolean foundStartRegion = false;
    -      regionsToScan = new TreeMap<TabletFragmentInfo, String>();
    +      // Fetch table and tabletInfo only once and cache.
    +      table = MapRDB.getTable(scanSpec.getTableName());
    +      tabletInfos = table.getTabletInfos(scanSpec.getCondition());
    +
    +      // Calculate totalRowCount for the table
    --- End diff --
    
    Please add a comment explaining why we compute the totalRowCount like so? 
    `totalRowCount += tabletInfo.getEstimatedNumRows();`


> Improve query performance for MapR DB JSON Tables
> -------------------------------------------------
>
>                 Key: DRILL-5429
>                 URL: https://issues.apache.org/jira/browse/DRILL-5429
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization, Storage - MapRDB
>    Affects Versions: 1.10.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>             Fix For: 1.11.0
>
>
> For MapR DB JSON Tables, cache (per query) and reuse table and tabletInfo, instead of
fetching the same information multiple times from DB server.  Also, getting tableStats is
an expensive operation.  We can avoid doing that and instead, get total rowCount from tabletInfo
instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message