drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khurram Faraaz (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-2806) Querying data from compressed csv file returns nulls and unreadable data
Date Thu, 16 Apr 2015 20:51:59 GMT
Khurram Faraaz created DRILL-2806:
-------------------------------------

             Summary: Querying data from compressed csv file returns nulls and unreadable
data
                 Key: DRILL-2806
                 URL: https://issues.apache.org/jira/browse/DRILL-2806
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Text & CSV
    Affects Versions: 0.9.0
         Environment: 9d92b8e319f2d46e8659d903d355450e15946533 | DRILL-2580: Exit early from
HashJoinBatch if build side is empty | 26.03.2015
            Reporter: Khurram Faraaz
            Assignee: Steven Phillips


Project columns from a compressed CSV data file returns unreadable data and nulls in the query
results. Querying the same CSV file in uncompressed format, the query returns correct results,
readable data and no nulls. Test was performed on 4 node cluster on CentOS.

{code}

0: jdbc:drill:> select columns[0], columns[1], columns[2], columns[3], columns[4], columns[5],
columns[6], columns[7] from `deletions-00000-of-00020.tgz` limit 10;
+------------+------------+------------+------------+------------+------------+------------+------------+
|   EXPR$0   |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |   EXPR$5   |   EXPR$6 
 |   EXPR$7   |
+------------+------------+------------+------------+------------+------------+------------+------------+
| 0U[ˮȑ|axaR)ﺫ=鲍i̊HDJ|?3̑$%Q$%
                                                TdfD8'2i$E^/Y}C'>|/7
                                                                                  H1o0! |
0g TMUܸW`ʙ&T
                                                                                         
                                      \uXپN|2I~Y 0RAX6UaXe+ow*]s | null       | null    
  | null       | null       | null       | null       |
| oM.ڻU/ | ̼\
                           )qwda7((
                                               	y[) | 9>^0>WM[{r]iE$ze&!EküIfa
| null       | null       | null       | null       | null       |
| SRΠ     | null       | null       | null       | null       | null       | null      
| null       |
| 6imJ\f_dYڿ]%ln3IaE*BGA-a$j:M!Uc)ﶘD~wUx0ɼgme]ӘcQ*pk$%\2ER-)(ÈxTn?SϓxeҜݠºI|'(Cni
s | null       | null       | null       | null       | null       | null       | null   
   |
| bxΜkr4ü_nIxl_s`vN	ó.$OL7Eބyڗia;Pu$M!AoCӦnlS-`ۢ+o~>%wzcgwtMge7"lMgZ=WྃgMRX1"a
| X=Rd.fab{t{
                                                                                         
                                                                                         
   A!t
                                                                                         
                                                                                         
           1$ڧw-0EXURg
                                                                                         
                                                                                         
                                  p	#qzߤ΢gWMem{=z{
                                                                                         
                                                                                         
                                                                eiA]^ | null       | null
      | null       | null       | null       | null       |
| ֌        | null       | null       | null       | null       | null       | null      
| null       |
| !{1H*m71`˰]oZ | 𾳔] &f4Z)4SP7Rm4^5WWXȧ<p.́3L
                                                                                      q%|WL-p[
| null       | null       | null       | null       | null       | null       |
| dqyd\K#"ԁ@ | null       | null       | null       | null       | null       | null    
  | null       |
| [GԊKFlɢ(ZK8h#D/[(U=_8ΏE%
                                                           [;
                                                              w}Fr`#Xk
                                                                              lT'15:y
                                                                                         
     ņPz(-ȓ񆹞Cs)1v	 | null       | null       | null       | null       | null       |
null       | null       |
| LyPO|Ώ(+n+H]
                         Ņ2?糩s/_ l
                                            +ӯb	 | null       | null       | null       |
null       | null       | null       | null       |
+------------+------------+------------+------------+------------+------------+------------+------------+
10 rows selected (0.176 seconds)

0: jdbc:drill:> select columns[0], columns[1], columns[2], columns[3], columns[4], columns[5],
columns[6], columns[7] from `deletions/deletions-00000-of-00020.csv` limit 10;
+------------+------------+------------+------------+------------+------------+------------+------------+
|   EXPR$0   |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |   EXPR$5   |   EXPR$6 
 |   EXPR$7   |
+------------+------------+------------+------------+------------+------------+------------+------------+
| 1354980518007 | /user/mwcl_musicbrainz | 1356247116000 | /user/google_gardener | /m/0nj707g
| /music/track_contribution/contributor | /m/09xmq3  | en         |
| 1359609261000 | /user/ahsan2002us | 1359697206000 | /user/mjsigua | /m/0q47ym9 | /common/topic/description
| Afrosheen CEO is the fictional character from the 2003 film The Watermelon Heist. | en 
       |
| 1258294630005 | /user/book_bot | 1260214155000 | /user/book_bot | /m/08g19rh | /book/book_edition/book
| /m/04sty07 | en         |
| 1260232964000 | /user/book_bot | 1360880749000 | /user/turtlewax_bot | /m/0872_f2 | /book/book_edition/book
| /m/069_gyc | en         |
| 1320298552000 | /user/gardening_bot | 1358083965004 | /user/googlebot | /m/01dy3t2 | /type/object/type
| /music/single | en         |
| 1360430129006 | /user/mwcl_musicbrainz | 1362830875001 | /user/mwcl_musicbrainz | /m/0qm1x62
| /music/release_track/release | /m/0ql38vr | en         |
| 1269251105000 | /user/mwcl_images | 1336539194001 | /user/gardening_bot | /m/06w7yw7 | /common/topic/image
| /m/0bcncxt | en         |
| 1225386250001 | /user/mwcl_images | 1336080683003 | /user/gardening_bot | /m/04sb526 | /common/licensed_object/license
| /m/02x6b   | en         |
| 1286991487000 | /user/mw_template_bot | 1362532733000 | /user/wikipedia_facts | /m/0dgs170
| /people/person/date_of_birth | 1975       | en         |
| 1258986090000 | /user/book_bot | 1260138587000 | /user/book_bot | /m/08r_m33 | /book/book_edition/book
| /m/04sty07 | en         |
+------------+------------+------------+------------+------------+------------+------------+------------+
10 rows selected (0.25 seconds)

Details of the files (compressed and uncompressed)

[root@centos-01 ~]# hadoop fs -ls /tmp/deletions-00000-of-00020.tgz
-rwxr-xr-x   3 root root  111364147 2015-04-16 20:35 /tmp/deletions-00000-of-00020.tgz
[root@centos-01 ~]# hadoop fs -ls /tmp/deletions/deletions-00000-of-00020.csv
-rwxr-xr-x   3 root root  395624293 2015-04-14 18:10 /tmp/deletions/deletions-00000-of-00020.csv

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message