impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From funes79 <...@git.apache.org>
Subject [GitHub] incubator-impala pull request #1: Branch 2.8.0
Date Fri, 27 Jan 2017 14:45:33 GMT
GitHub user funes79 opened a pull request:

    https://github.com/apache/incubator-impala/pull/1

    Branch 2.8.0

    I would like to register my first pull request for Impala. We are using it in production
almost 3 years.
    I would like to suggest to improve the behaviour of compute incremental stats. 
    We have a very very large table, initialy migrated from other cluster and we had to create
stats on the table. Compute incremental stats after 4 hours failed (skipped), and in that
time based on HDFS reads almost 90% of the table was scanned. Unfortunately Impala didnt stored
the partitions statisics (daily paritions) so when I checked the stats there was everywhere
false. And the performance of the compute stats is very poor, it looks like it is scanning
partition by partition the tables, and if the partitons is small (on one node) the other nodes
are stayin idle.  
    Two improvements I would suggest:
     - write the calculated stats immediatly after the partitions stats are gathered
     - if the table has large number of partitoons (3 years, 1000 partitons) scan at least
so many partions how many Impala Daemon are configured in parallel.
    
    Thanks

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-impala branch-2.8.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-impala/pull/1.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1
    
----
commit 2423d23f8a84f4b38d2250ae0598207aeda243b2
Author: Jim Apple <jbapple-impala@apache.org>
Date:   2017-01-06T23:53:24Z

    Update VERSION to begin release candidate testing
    
    Change-Id: I0fcec577babba0929600d540936bb154a42dee50

commit 95e9479c12a3ba6fdfed25ae88467c8ba4622ad2
Author: Jim Apple <jbapple-impala@apache.org>
Date:   2017-01-05T16:19:28Z

    Add disclaimer to docs: Cloudera-specific info still present.
    
    While we are working on excising it, we don't want users to be
    confused about what the manual is intended to describe.
    
    Change-Id: I7740189fd7ff7f22d8471f037e190d9923521936
    Reviewed-on: http://gerrit.cloudera.org:8080/5610
    Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
    Tested-by: Impala Public Jenkins

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message