impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Quanlong Huang" <huang_quanl...@126.com>
Subject Re:Re: Load metadata exactly in need
Date Tue, 12 Sep 2017 05:21:34 GMT
Hi Dimitris,


Thanks for your quick reply!


IMPALA-3127 is a great ticket. But it still has no progress and no assignee. Is it tracked
in your internal Jira?


Hopes this can be done soon, since some users may choose Presto instead of Impala due to these
usability cases.


Thanks
Quanlong

At 2017-09-12 12:17:23, "Dimitris Tsirogiannis" <dtsirogiannis@cloudera.com> wrote:
>Hi Quanlong,
>
>You're right. The catalog needs to handle metadata at a finer granularity.
>We are actively looking into the options you mentioned as well as other
>related changes (see IMPALA-3234 and IMPALA-3127) to improve the
>performance and scalability of metadata management.
>
>Thanks
>Dimitris
>
>On Mon, Sep 11, 2017 at 8:51 PM, Quanlong Huang <huang_quanlong@126.com>
>wrote:
>
>> Hi all,
>>
>>
>> Currently if a "describe" statement hits an incomplete table, the impalad
>> will send an RPC request to the catalogd for loading metadata of this
>> table. It will take a long time for tables with many partitions and many
>> files. However, to serve the "describe" statement, we just need the
>> metadata in Hive MetaStore. In my experiments (with
>> load_catalog_in_background=false), it take hours to describe a large
>> table. This statement is pretty cheap in Hive or Presto. Users may worry
>> about whether impala is set up correctly.
>>
>>
>> Can we add a more fine grain strategy about loading the metadata? For
>> queries just hit one partition of a huge table, we don't need to load all
>> the file descriptors as well.  For example, more levels to trigger metadata
>> load:
>> Level1. Load metadata from Hive MetaStore
>> Level2. Load file descriptors of given partitions
>> Level3. Load all file descriptors
>>
>>
>> Then we can serve the following scenario better:
>> 1. describe a large table
>> 2. run query on one or several partitions of this table. (Each partition
>> has few files)
>>
>>
>> Do we have some discussion about this before?
>>
>>
>> Thanks
>> Quanlong
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message