drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Phillips (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2806) Querying data from compressed csv file returns nulls and unreadable data
Date Thu, 16 Apr 2015 21:50:59 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498792#comment-14498792
] 

Steven Phillips commented on DRILL-2806:
----------------------------------------

That list has nothing to do with being able to query or decompress. That is simply a list
of file extensions the file system will use to determine whether or not to use MapR's native
filesystem compression.

.tgz is not a compression codec that Drill understands or can work with.

The only compression codecs that work with Drill out of the box are gz, and bz2. Additional
codecs can be added by including the relevant libraries in the Drill classpath.

> Querying data from compressed csv file returns nulls and unreadable data
> ------------------------------------------------------------------------
>
>                 Key: DRILL-2806
>                 URL: https://issues.apache.org/jira/browse/DRILL-2806
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 0.9.0
>         Environment: 9d92b8e319f2d46e8659d903d355450e15946533 | DRILL-2580: Exit early
from HashJoinBatch if build side is empty | 26.03.2015
>            Reporter: Khurram Faraaz
>            Assignee: Steven Phillips
>
> Project columns from a compressed CSV data file returns unreadable data and nulls in
the query results. Querying the same CSV file in uncompressed format, the query returns correct
results, readable data and no nulls. Test was performed on 4 node cluster on CentOS.
> {code}
> 0: jdbc:drill:> select columns[0], columns[1], columns[2], columns[3], columns[4],
columns[5], columns[6], columns[7] from `deletions-00000-of-00020.tgz` limit 10;
> +------------+------------+------------+------------+------------+------------+------------+------------+
> |   EXPR$0   |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |   EXPR$5   |   EXPR$6
  |   EXPR$7   |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> | 0U[ˮȑ|axaR)ﺫ=鲍i̊HDJ|?3̑$%Q$%
>                                                 TdfD8'2i$E^/Y}C'>|/7
>                                                                                   H1o0!
| 0g TMUܸW`ʙ&T
>                                                                                     
                                           \uXپN|2I~Y 0RAX6UaXe+ow*]s | null       | null
      | null       | null       | null       | null       |
> | oM.ڻU/ | ̼\
>                            )qwda7((
>                                                	y[) | 9>^0>WM[{r]iE$ze&!EküIfa
| null       | null       | null       | null       | null       |
> | SRΠ     | null       | null       | null       | null       | null       | null 
     | null       |
> | 6imJ\f_dYڿ]%ln3IaE*BGA-a$j:M!Uc)ﶘD~wUx0ɼgme]ӘcQ*pk$%\2ER-)(ÈxTn?SϓxeҜݠºI|'(Cni
s | null       | null       | null       | null       | null       | null       | null   
   |
> | bxΜkr4ü_nIxl_s`vN	ó.$OL7Eބyڗia;Pu$M!AoCӦnlS-`ۢ+o~>%wzcgwtMge7"lMgZ=WྃgMRX1"a
| X=Rd.fab{t{
>                                                                                     
                                                                                         
        A!t
>                                                                                     
                                                                                         
                1$ڧw-0EXURg
>                                                                                     
                                                                                         
                                       p	#qzߤ΢gWMem{=z{
>                                                                                     
                                                                                         
                                                                     eiA]^ | null       |
null       | null       | null       | null       | null       |
> | ֌        | null       | null       | null       | null       | null       | null 
     | null       |
> | !{1H*m71`˰]oZ | 𾳔] &f4Z)4SP7Rm4^5WWXȧ<p.́3L
>                                                                                     
 q%|WL-p[ | null       | null       | null       | null       | null       | null       |
> | dqyd\K#"ԁ@ | null       | null       | null       | null       | null       | null
      | null       |
> | [GԊKFlɢ(ZK8h#D/[(U=_8ΏE%
>                                                            [;
>                                                               w}Fr`#Xk
>                                                                               lT'15:y
>                                                                                     
          ņPz(-ȓ񆹞Cs)1v	 | null       | null       | null       | null       | null  
    | null       | null       |
> | LyPO|Ώ(+n+H]
>                          Ņ2?糩s/_ l
>                                             +ӯb	 | null       | null       | null  
    | null       | null       | null       | null       |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> 10 rows selected (0.176 seconds)
> 0: jdbc:drill:> select columns[0], columns[1], columns[2], columns[3], columns[4],
columns[5], columns[6], columns[7] from `deletions/deletions-00000-of-00020.csv` limit 10;
> +------------+------------+------------+------------+------------+------------+------------+------------+
> |   EXPR$0   |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |   EXPR$5   |   EXPR$6
  |   EXPR$7   |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> | 1354980518007 | /user/mwcl_musicbrainz | 1356247116000 | /user/google_gardener | /m/0nj707g
| /music/track_contribution/contributor | /m/09xmq3  | en         |
> | 1359609261000 | /user/ahsan2002us | 1359697206000 | /user/mjsigua | /m/0q47ym9 | /common/topic/description
| Afrosheen CEO is the fictional character from the 2003 film The Watermelon Heist. | en 
       |
> | 1258294630005 | /user/book_bot | 1260214155000 | /user/book_bot | /m/08g19rh | /book/book_edition/book
| /m/04sty07 | en         |
> | 1260232964000 | /user/book_bot | 1360880749000 | /user/turtlewax_bot | /m/0872_f2 |
/book/book_edition/book | /m/069_gyc | en         |
> | 1320298552000 | /user/gardening_bot | 1358083965004 | /user/googlebot | /m/01dy3t2
| /type/object/type | /music/single | en         |
> | 1360430129006 | /user/mwcl_musicbrainz | 1362830875001 | /user/mwcl_musicbrainz | /m/0qm1x62
| /music/release_track/release | /m/0ql38vr | en         |
> | 1269251105000 | /user/mwcl_images | 1336539194001 | /user/gardening_bot | /m/06w7yw7
| /common/topic/image | /m/0bcncxt | en         |
> | 1225386250001 | /user/mwcl_images | 1336080683003 | /user/gardening_bot | /m/04sb526
| /common/licensed_object/license | /m/02x6b   | en         |
> | 1286991487000 | /user/mw_template_bot | 1362532733000 | /user/wikipedia_facts | /m/0dgs170
| /people/person/date_of_birth | 1975       | en         |
> | 1258986090000 | /user/book_bot | 1260138587000 | /user/book_bot | /m/08r_m33 | /book/book_edition/book
| /m/04sty07 | en         |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> 10 rows selected (0.25 seconds)
> Details of the files (compressed and uncompressed)
> [root@centos-01 ~]# hadoop fs -ls /tmp/deletions-00000-of-00020.tgz
> -rwxr-xr-x   3 root root  111364147 2015-04-16 20:35 /tmp/deletions-00000-of-00020.tgz
> [root@centos-01 ~]# hadoop fs -ls /tmp/deletions/deletions-00000-of-00020.csv
> -rwxr-xr-x   3 root root  395624293 2015-04-14 18:10 /tmp/deletions/deletions-00000-of-00020.csv
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message