hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sri Ramadasu <amar...@yahoo-inc.com>
Subject Filter Operator applied twice on a where clause?
Date Thu, 12 Aug 2010 09:31:52 GMT
Hi,

I see that if a query has where clause, the FilterOperator is applied twice. Can you tell
me why is it done so?
It seems second operator is always filtering zero rows.

Explain on a query with where clause :
hive> explain select * from input1 where input1.key != 10;
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF input1)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE))
(TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (!= (. (TOK_TABLE_OR_COL input1) key)
10))))

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        input1
          TableScan
            alias: input1
            Filter Operator
              predicate:
                  expr: (key <> 10)
                  type: boolean
              Filter Operator
                predicate:
                    expr: (key <> 10)
                    type: boolean
                Select Operator
                  expressions:
                        expr: key
                        type: int
                        expr: value
                        type: int
                  outputColumnNames: _col0, _col1
                  File Output Operator
                    compressed: false
                    GlobalTableId: 0
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
    Fetch Operator
      limit: -1

I see the same from the Mapper logs also. The first FilterOperator does the filtering and
second operator always filters zero rows.

2010-08-12 14:33:22,149 INFO ExecMapper:
<MAP>Id =5
  <Children>
    <TS>Id =0
      <Children>
        <FIL>Id =1
          <Children>
            <FIL>Id =2
              <Children>
                <SEL>Id =3
                  <Children>
                    <FS>Id =4
                      <Parent>Id = 3 null<\Parent>
                    <\FS>
                  <\Children>
                  <Parent>Id = 2 null<\Parent>
                <\SEL>
              <\Children>
              <Parent>Id = 1 null<\Parent>
            <\FIL>
          <\Children>
          <Parent>Id = 0 null<\Parent>
        <\FIL>
      <\Children>
      <Parent>Id = 5 null<\Parent>
    <\TS>
  <\Children>
<\MAP>

2010-08-12 14:33:22,272 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 forwarding 1 rows
2010-08-12 14:33:22,272 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding
1 rows
2010-08-12 14:33:22,450 INFO ExecMapper: ExecMapper: processing 1 rows: used memory = 4417072
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 forwarded 1 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished.
closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded
1 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 forwarded 0
rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:1
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 forwarded 0
rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 forwarded 0
rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished.
closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 forwarded
0 rows
2010-08-12 14:33:22,451 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path:
FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-12_14-33-14_470_1825337114959896683/_tmp.-ext-10001/000000_0
2010-08-12 14:33:22,451 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp
file: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-12_14-33-14_470_1825337114959896683/_tmp.-ext-10001/_tmp.000000_0
2010-08-12 14:33:22,454 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path:
FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-12_14-33-14_470_1825337114959896683/_tmp.-ext-10001/000000_0
2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 Close done
2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 Close done
2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 Close done
2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 Close done
2010-08-12 14:33:22,485 INFO ExecMapper: ExecMapper: processed 1 rows: used memory = 5135888

Thanks
Amareshwari

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message