drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Khurram Faraaz <kfar...@maprtech.com>
Subject [DISCUSS] Querying nested data from JSON with and without json_all_text_mode results in errors
Date Thu, 10 Sep 2015 18:46:48 GMT
Hi,


Querying nested data from a JSON file with and with out setting
store.json.all_text_mode
results in errors.


Data that was used in the test is available here -
https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5

There is an option (on the top right) to Export as JSON.

Drill commit ID : 0686bc23 (on master)


[root@centos-01 ~]# hadoop fs -ls /tmp/rows.json

-rwxr-xr-x   3 root root  156417523 2015-09-10 17:18 /tmp/rows.json


[root@centos-01 food_inspection_data]# hadoop fs -put rows.json /tmp

[root@centos-01 food_inspection_data]# cd /opt/mapr/drill/drill-1.2.0/bin

[root@centos-01 bin]# ./sqlline -u "jdbc:drill:schema=dfs.tmp"

apache drill 1.0.0

"a little sql for your nosql"


Query 1)


0: jdbc:drill:schema=dfs.tmp> select * from `rows.json`;

Error: DATA_READ ERROR: You tried to start when you are using a ValueWriter
of type NullableVarCharWriterImpl.


File  /tmp/rows.json

Record  1

Line  1023

Column  29

Field  smallest

Fragment 0:0


[Error Id: 0844933a-ed61-4137-9970-203576d5770d on centos-02.qa.lab:31010]
(state=,code=0)


*-----*


Query 2) Drill seems to be checking for correctness of data much deeper in
the nesting. Note that I do a count over field* meta, *and field*
"smallest" *reported in error is much deeper inside* meta, *
meta.view.columns.cachedContents.smallest


0: jdbc:drill:schema=dfs.tmp> select count(meta) from `rows.json`;

Error: DATA_READ ERROR: You tried to start when you are using a ValueWriter
of type NullableVarCharWriterImpl.


File  /tmp/rows.json

Record  1

Line  1023

Column  29

Field  smallest

Fragment 0:0


[Error Id: 66e7f461-6671-4862-bde5-685c8c2304c9 on centos-04.qa.lab:31010]
(state=,code=0)


Explain plan for Query (2)


0: jdbc:drill:schema=dfs.tmp> explain plan for select count(meta) from
`rows.json`;

*+------+------+*

*| **text** | **json** |*

*+------+------+*

*| *00-00    Screen

00-01      Project(EXPR$0=[$0])

00-02        StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])

00-03          StreamAgg(group=[{}], EXPR$0=[COUNT($0)])

00-04            Scan(groupscan=[EasyGroupScan
[selectionRoot=maprfs:/tmp/rows.json, numFiles=1, columns=[`meta`],
files=[maprfs:///tmp/rows.json]]])


Query 3)


0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view) from `rows.json` t;

Error: DATA_READ ERROR: You tried to start when you are using a ValueWriter
of type NullableVarCharWriterImpl.


File  /tmp/rows.json

Record  1

Line  1023

Column  29

Field  smallest

Fragment 0:0


[Error Id: 55ea650c-9587-42d8-aec9-0f5a4b77ee4c on centos-04.qa.lab:31010]
(state=,code=0)


Query 4) - The count returned is wrong!


0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view.id) from `rows.json`
t;

*+---------+*

*| **EXPR$0 ** |*

*+---------+*

*| *1      * |*

*+---------+*

1 row selected (1.57 seconds)

-------------------

The below queries were run after json all text mode configuration was
enabled.

0: jdbc:drill:schema=dfs.tmp> alter session set
`store.json.all_text_mode`=true;

*+-------+------------------------------------+*

*| ** ok  ** | **             summary              ** |*

*+-------+------------------------------------+*

*| *true * | *store.json.all_text_mode updated. * |*

*+-------+------------------------------------+*

1 row selected (0.218 seconds)


0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view) from `rows.json` t;

Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
are using a ValueWriter of type NullableVarCharWriterImpl.


File  /tmp/rows.json

Record  1

Fragment 0:0


[Error Id: 5eeee0f3-69fd-445c-a149-a3aa2f18d1cb on centos-04.qa.lab:31010]
(state=,code=0)


COUNT returned in below query is wrong.


0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view.id) from `rows.json`
t;

+---------+

| EXPR$0  |

+---------+

| 1       |

+---------+

1 row selected (0.948 seconds)


Query to perform count over nested data.


0: jdbc:drill:schema=dfs.tmp> select
count(t.meta.view.columns.cachedContents.smallest) from `rows.json` t;

Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
are using a ValueWriter of type NullableVarCharWriterImpl.


File  /tmp/rows.json

Record  1

Fragment 0:0


[Error Id: 4ce0d5fe-abd7-47d9-8106-269fe37409e6 on centos-04.qa.lab:31010]
(state=,code=0)


0: jdbc:drill:schema=dfs.tmp> select
t.meta.view.columns.cachedContents.smallest from `rows.json` t;

Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
are using a ValueWriter of type NullableVarCharWriterImpl.


File  /tmp/rows.json

Record  1

Fragment 0:0


[Error Id: b2514414-af51-46c5-938d-ff792e6a8a8c on centos-04.qa.lab:31010]
(state=,code=0)

0: jdbc:drill:schema=dfs.tmp> select * from `rows.json`;

Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
are using a ValueWriter of type NullableVarCharWriterImpl.


File  /tmp/rows.json

Record  1

Fragment 0:0


[Error Id: 3615e088-64cc-4cba-8cfe-602563f4e16c on centos-04.qa.lab:31010]
(state=,code=0)

These are the occorrences of field named *smallest* in the JSON file

Administrators-MacBook-Pro:Downloads kfaraaz$ grep "smallest" rows.json

          "smallest" : "44247",

          "smallest" : "'C' A CATERING AND EVENT COMPANY",

          "smallest" : "'C' A CATERING AND EVENT COMPANY",

          "smallest" : "0",

          "smallest" : "(convenience store)",

          "smallest" : "All",

          "smallest" : " ",

          "smallest" : "alsip",

          "smallest" : "IL",

          "smallest" : "60007",

          "smallest" : "2010-01-04T00:00:00",

          "smallest" : "1315 license reinspection",

          "smallest" : "Business Not Located",

          "smallest" : "1. SOURCE SOUND CONDITION, NO SPOILAGE, FOODS
PROPERLY LABELED, SHELLFISH TAGS IN PLACE - Comments: ",

          "smallest" : "41.64467013219805",

          "smallest" : "-87.91442843927047",

          "smallest" : {

, [ 101045, "180563EF-F00C-41F9-A0C2-7B2100AD2DED", 101045, 1441883778,
"386464", 1441883778, "386464", null, "343208", "SUBWAY # 26817-309 W.
MONROE", "SUBWAY", "1519940", "Restaurant", "Risk 1 (High)", "309 W MONROE
ST ", "CHICAGO", "IL", "60606", "2010-07-13T00:00:00", "Canvass", "Pass",
"32. FOOD AND NON-FOOD CONTACT SURFACES PROPERLY DESIGNED, CONSTRUCTED AND
MAINTAINED - Comments: All food and non-food contact equipment and utensils
shall be smooth, easily cleanable, and durable, and shall be in good
repair. A few of the smallest cutting boards, with deep/dark grooves must
be sanded/bleached- to make smooth/easily cleanable. | 33. FOOD AND
NON-FOOD CONTACT EQUIPMENT UTENSILS CLEAN, FREE OF ABRASIVE DETERGENTS -
Comments: All food and non-food contact surfaces of equipment and all food
storage utensils shall be thoroughly cleaned and sanitized daily. Interior
of ice chute at soda dispensers need cleaning. | 34. FLOORS: CONSTRUCTED
PER CODE, CLEANED, GOOD REPAIR, COVING INSTALLED, DUST-LESS CLEANING
METHODS USED - Comments: The floors shall be constructed per code, be
smooth and easily cleaned, and be kept clean and in good repair. Floors of
walk-in cooler (around corners) need cleaning; baseboard at the outside
base of the walk-in cooler door is in poor repair, must fix.",
"41.88052826494154", "-87.63556938091433", [ null, "41.88052826494154",
"-87.63556938091433", null, false ] ]

Thanks,
Khurram

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message