drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhishek Girish (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-5105) Query time increases exponentially with increasing nested levels
Date Mon, 05 Dec 2016 04:33:58 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Abhishek Girish updated DRILL-5105:
-----------------------------------
    Description: 
The time taken to query any JSON dataset depends on number of nested levels within the dataset.
Also, increasing the complexity of the dataset further impacts the execution time. 

Tabulated below is cached query execution times for a simple select * query over two simple
forms of JSON datasets: 

|| No. Levels   || Time (s) Dataset 1 || Time (s) Dataset 2  ||
|1	           |0.22                          |0.27                          |
|2		   |0.23		             |0.25                          |
|4		   |0.24		             |0.22                          |
|8		   |0.22		             |0.23                          |
|16		   |0.34		             |0.48                          |
|24		   |25.76		             |72.51                        |
|26		   |103.48		             |289.6                        |
|28		   |336.12		             |1151.94                    |
|30		   |1342.22		     |4611.19                    |
|32		   |5360.2		             |Expected: ~20k        |

The above table lists query times for 20 different JSON files, 10 belonging to dataset 1 &
10 belonging to dataset 2. Each have 1 record, but the number of nested levels within them
vary as mentioned in the "No. Levels" column. 

It appears that the query time almost doubles with addition of a nested level (note that in
the table above, it translates to almost 4x across said levels) 

The below two are the representative datasets, showcasing simple JSON structures with nested
levels.

Structure of Dataset 1:
{code}
{
  "level1": {
    "field1": "a",
    "level2": {
      "field1"": "b",
      ...
    }
  }
}
{code}

Structure of Dataset 2:
{code}
"{
  "level1": {
    "field1": ""a",
    "field2": {
      "nfield1": true,
      "nfield2": 1.1
    },
    "level2": {
      "field1": "b",
      "field2": {
        "nfield1": false,
        "nfield2": 2.2
      },
      ...
    }
  }
}
{code}




  was:
The time taken to query any JSON dataset depends on number of nested levels within the dataset.
Also, increasing the complexity of the dataset further impacts the execution time. 

Tabulated below is cached query execution times for a simple select * query over two simple
forms of JSON datasets: 

|| No. Levels   || Time (s) Dataset 1 || Time (s) Dataset 2  ||
|1	           |0.22                          |0.27                          |
|2		   |0.23		             |0.25                          |
|4		   |0.24		             |0.22                          |
|8		   |0.22		             |0.23                          |
|16		   |0.34		             |0.48                          |
|24		   |25.76		             |72.51                        |
|26		   |103.48		             |289.6                        |
|28		   |336.12		             |1151.94                    |
|30		   |1342.22		     |4611.19                    |
|32		   |5360.2		             |Expected: ~20k        |

The above table lists query times for 20 different JSON files, 10 belonging to dataset 1 &
10 belonging to dataset 2. Each have 1 record, but the number of nested levels within them
vary as mentioned in the "# Levels" column. 

It appears that the query time almost doubles with addition of a nested level (note that in
the table above, it translates to almost 4x across said levels) 

The below two are the representative datasets, showcasing simple JSON structures with nested
levels.

Structure of Dataset 1:
{code}
{
  "level1": {
    "field1": "a",
    "level2": {
      "field1"": "b",
      ...
    }
  }
}
{code}

Structure of Dataset 2:
{code}
"{
  "level1": {
    "field1": ""a",
    "field2": {
      "nfield1": true,
      "nfield2": 1.1
    },
    "level2": {
      "field1": "b",
      "field2": {
        "nfield1": false,
        "nfield2": 2.2
      },
      ...
    }
  }
}
{code}





> Query time increases exponentially with increasing nested levels
> ----------------------------------------------------------------
>
>                 Key: DRILL-5105
>                 URL: https://issues.apache.org/jira/browse/DRILL-5105
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.9.0
>         Environment: 3 Node Cluster with default memory and configurations. 
>            Reporter: Abhishek Girish
>
> The time taken to query any JSON dataset depends on number of nested levels within the
dataset. Also, increasing the complexity of the dataset further impacts the execution time.

> Tabulated below is cached query execution times for a simple select * query over two
simple forms of JSON datasets: 
> || No. Levels   || Time (s) Dataset 1 || Time (s) Dataset 2  ||
> |1	           |0.22                          |0.27                          |
> |2		   |0.23		             |0.25                          |
> |4		   |0.24		             |0.22                          |
> |8		   |0.22		             |0.23                          |
> |16		   |0.34		             |0.48                          |
> |24		   |25.76		             |72.51                        |
> |26		   |103.48		             |289.6                        |
> |28		   |336.12		             |1151.94                    |
> |30		   |1342.22		     |4611.19                    |
> |32		   |5360.2		             |Expected: ~20k        |
> The above table lists query times for 20 different JSON files, 10 belonging to dataset
1 & 10 belonging to dataset 2. Each have 1 record, but the number of nested levels within
them vary as mentioned in the "No. Levels" column. 
> It appears that the query time almost doubles with addition of a nested level (note that
in the table above, it translates to almost 4x across said levels) 
> The below two are the representative datasets, showcasing simple JSON structures with
nested levels.
> Structure of Dataset 1:
> {code}
> {
>   "level1": {
>     "field1": "a",
>     "level2": {
>       "field1"": "b",
>       ...
>     }
>   }
> }
> {code}
> Structure of Dataset 2:
> {code}
> "{
>   "level1": {
>     "field1": ""a",
>     "field2": {
>       "nfield1": true,
>       "nfield2": 1.1
>     },
>     "level2": {
>       "field1": "b",
>       "field2": {
>         "nfield1": false,
>         "nfield2": 2.2
>       },
>       ...
>     }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message