hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vineet Singh (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-15260) Hive queries on tez are overwriting records on azure wasb storage.
Date Fri, 23 Feb 2018 14:16:00 GMT
Vineet Singh created HADOOP-15260:

             Summary: Hive queries on tez are overwriting records on azure wasb storage.
                 Key: HADOOP-15260
                 URL: https://issues.apache.org/jira/browse/HADOOP-15260
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs/azure
    Affects Versions: 2.7.3
         Environment: This scenario occurs on hdp 2.5(hadoop 2.7.3) hdfs on WASB microsoft
Azure platform.
The same query yields proper result on regular hdfs on hdp 2.5(hadoop 2.7.3) on premise cluster.
            Reporter: Vineet Singh
         Attachments: On Premise Cluster.JPG, azure cloud.JPG, sample_query.txt

When running multiple hive queries on Tez (see example ) the same mapper task number gets
overwritten by the next union query. As seen in the azure snapshot the directories /1 ,2
...,100 get overwritten again and again since the mapper numbers launch write again and again
in the same directories.

But in the on premise hadoop cluster version 2.7.3 . The directories are created as 1_copy_0,1_copy_2
and so on. Creating copies does not overwrite the data.

The queries would be usually 600-1000 queries union together.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message