hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Balaji Varadarajan (Jira)" <>
Subject [jira] [Commented] (HUDI-628) MultiPartKeysValueExtractor does not work with
Date Sat, 22 Feb 2020 00:26:00 GMT


Balaji Varadarajan commented on HUDI-628:

@Andrew Wong, This is expected if you use MultiPartKeysValueExtractor as it splits by "/".
You might want to give 3 fields as partition fields (continent, country, city) for  "americas/brazil/sao_paulo".
If you want to treat them as one field, you can simply add a new implementation for PartitionValueExtractor
and plug it in.  

> MultiPartKeysValueExtractor does not work with
> ---------------------------------------------------------------
>                 Key: HUDI-628
>                 URL:
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>            Reporter: Andrew Wong
>            Assignee: Balaji Varadarajan
>            Priority: Major
>         Attachments: stack_trace.txt
> The [] example data has a column
`partitionpath` which holds values like `americas/brazil/sao_paulo`. Using the docker environment's
spark-shell, you can change the basePath from the quickstart to save to hdfs://user/hive/warehouse/hudi_trips_cow
and write the table. Then you can see the folder in the HDFS browser, similar to the stock_ticks_cow
folder created in the docker demo.
> However, if you try to use to sync the table to Hive, you get the error:
"java.lang.IllegalArgumentException: Partition key parts [partitionpath] does not match with
partition values [americas, brazil, sao_paulo]. Check partition strategy. "
> {quote}{{/var/hoodie/ws/hudi-hive/ --jdbc-url jdbc:hive2://hiveserver:10000
--user hive --pass hive --partitioned-by partitionpath --partition-value-extractor org.apache.hudi.hive.MultiPartKeysValueExtractor
-MultiPartKeysValueExtractor -base-path /user/hive/warehouse/hudi_trips_cow --database default
--table hudi_trips_cow}}
> {quote}
> This error is thrown in `HoodieHiveClient.getPartitionClause`, which uses `extractPartitionValuesInPath`
to get a list of partitionValues. The problem is that it compares the length of the partitionValues
to the length of the partitionField. In this example, there is only 1 partitionField, "partitionpath,"
which is split into 3 partitionValues. Thus the check fails and throws the exception. 
> See []

This message was sent by Atlassian Jira

View raw message