hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Yongqiang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-477) Some optimization thoughts for Hive
Date Thu, 07 May 2009 11:11:30 GMT

     [ https://issues.apache.org/jira/browse/HIVE-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

He Yongqiang updated HIVE-477:
------------------------------

    Description: 
Before we can start working on Hive-461. I am doing some profiling for hive. And here are
some thoughts for improvements:

minor :
1) add a new HiveText to replace Text. It can avoid byte copy when init LazyString. I have
done a draft one, it shows  ~1% performance gains.
2) let StructObjectInspector's 
    {noformat}
     public List<Object> getStructFieldsDataAsList(Object data);
    {noformat}
to be 
    {noformat}
     public Object[] getStructFieldsDataAsArray(Object data);
    {noformat}

In my profiling test, it shows some performace gains. but in acutal execution it did not.
Anyway, let it return java array will reduce gc's burden of collection ArrayList

not so minor:
3) split FileSinkOperator's Writer into another Thread. Adding a producer-consumer array as
the bridge between the Operators thread and the Writer thread.
4) the operator stack is kind of deep. In order to avoid instruction cache misses, and increase
the efficiency data cache, I suggest to let Hive's operator can process an array of rows instead
of processing only one row at a time.

  was:
Before we can start working on Hive-461. I am doing some profiling for hive. And here are
some thoughts for improvements:

minor :
1) add a new HiveText to replace Text. It can avoid byte copy when init LazyString. I have
done a draft one, it shows  ~1% performance gains.
2) let StructObjectInspector's 
    {noformat}
     public List<Object> getStructFieldsDataAsList(Object data);
    {noformat}
to be 
    {noformat}
     public Object[] getStructFieldsDataAsArray(Object data);
    {noformat}

In my profile, it shows some performace gains. but in acutal execution it did not. Anyway,
let it return java array will reduce gc's burden of collection ArrayList

not so minor:
3) split FileSinkOperator's Writer into another Thread. Adding a producer-consumer array as
the bridge between the Operators thread and the Writer thread.
4) the operator stack is kind of deep. In order to avoid instruction cache, and increase the
efficiency data cache. I suggest to let Hive's operator can process an array of rows instead
of processing only one row at a time.


> Some optimization thoughts for Hive
> -----------------------------------
>
>                 Key: HIVE-477
>                 URL: https://issues.apache.org/jira/browse/HIVE-477
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Before we can start working on Hive-461. I am doing some profiling for hive. And here
are some thoughts for improvements:
> minor :
> 1) add a new HiveText to replace Text. It can avoid byte copy when init LazyString. I
have done a draft one, it shows  ~1% performance gains.
> 2) let StructObjectInspector's 
>     {noformat}
>      public List<Object> getStructFieldsDataAsList(Object data);
>     {noformat}
> to be 
>     {noformat}
>      public Object[] getStructFieldsDataAsArray(Object data);
>     {noformat}
> In my profiling test, it shows some performace gains. but in acutal execution it did
not. Anyway, let it return java array will reduce gc's burden of collection ArrayList
> not so minor:
> 3) split FileSinkOperator's Writer into another Thread. Adding a producer-consumer array
as the bridge between the Operators thread and the Writer thread.
> 4) the operator stack is kind of deep. In order to avoid instruction cache misses, and
increase the efficiency data cache, I suggest to let Hive's operator can process an array
of rows instead of processing only one row at a time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message