hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wong (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HUDI-614) EMR Presto cannot read Hudi tables
Date Tue, 25 Feb 2020 22:30:00 GMT

     [ https://issues.apache.org/jira/browse/HUDI-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Wong updated HUDI-614:
-----------------------------
    Description: 
Original issue: [https://github.com/apache/incubator-hudi/issues/1329]

Code I tried: [https://gist.github.com/popart/c16a4661528fe1819aa63a1fed351e0c,] based off
of https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi-work-with-dataset.html.

 

In AWS EMR 5.28.0 & 5.28.0, attempting to query a hudi table from Presto results in the
error: Could not find partitionDepth in partition metafile. In docker environment Presto
works fine.

cc: [~bhasudha]

  was:
Original issue: [https://github.com/apache/incubator-hudi/issues/1329]

I made a non-partitioned Hudi table using Spark. I was able to query it with Spark & Hive,
but when I tried querying it with Presto, I received the error {{Could not find partitionDepth
in partition metafile}}.

I attempted this task using emr-5.28.0 in AWS. I tried using the built-in spark-shell with
both Amazon's /usr/lib/hudi/hudi-spark-bundle.jar (following [https://aws.amazon.com/blogs/aws/new-insert-update-delete-data-on-s3-with-amazon-emr-and-apache-hudi/)] and
the org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating jar (following [https://hudi.apache.org/docs/quick-start-guide.html]).

I used NonpartitionedKeyGenerator & NonPartitionedExtractor in my write options, according
to [https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowdoIuseDeltaStreamerorSparkDataSourceAPItowritetoaNon-partitionedHudidataset?].
You can see my code in the github issue linked above.

In both cases I see the .hoodie_partition_metadata file was created in the table path in S3.
Querying the table worked in spark-shell & hive-cli, but attempting to query the table
in presto-cli resulted in the error, "Could not find partitionDepth in partition metafile".

Please look into the bug or check the documentation. If there is a problem with the EMR install
I can contact the AWS team responsible.

cc: [~bhasudha]


> EMR Presto cannot read Hudi tables
> ----------------------------------
>
>                 Key: HUDI-614
>                 URL: https://issues.apache.org/jira/browse/HUDI-614
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>    Affects Versions: 0.5.0, 0.5.1
>            Reporter: Andrew Wong
>            Priority: Major
>
> Original issue: [https://github.com/apache/incubator-hudi/issues/1329]
> Code I tried: [https://gist.github.com/popart/c16a4661528fe1819aa63a1fed351e0c,] based
off of https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi-work-with-dataset.html.
>  
> In AWS EMR 5.28.0 & 5.28.0, attempting to query a hudi table from Presto results
in the error: Could not find partitionDepth in partition metafile. In docker environment
Presto works fine.
> cc: [~bhasudha]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message