From commits-return-11964-archive-asf-public=cust-asf.ponee.io@hudi.apache.org Sun Feb 23 07:32:55 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 84DAD180652 for ; Sun, 23 Feb 2020 08:32:55 +0100 (CET) Received: (qmail 59864 invoked by uid 500); 23 Feb 2020 07:32:54 -0000 Mailing-List: contact commits-help@hudi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hudi.apache.org Delivered-To: mailing list commits@hudi.apache.org Received: (qmail 59850 invoked by uid 99); 23 Feb 2020 07:32:54 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Feb 2020 07:32:54 +0000 From: GitBox To: commits@hudi.apache.org Subject: [GitHub] [incubator-hudi] bhasudha commented on a change in pull request #1333: [HUDI-589][DOCS] Fix querying_data page Message-ID: <158244317479.29713.17381905847156388438.gitbox@gitbox.apache.org> References: In-Reply-To: Date: Sun, 23 Feb 2020 07:32:54 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit bhasudha commented on a change in pull request #1333: [HUDI-589][DOCS] Fix querying_data page URL: https://github.com/apache/incubator-hudi/pull/1333#discussion_r382972257 ########## File path: docs/_docs/2_3_querying_data.md ########## @@ -145,8 +161,13 @@ Additionally, `HoodieReadClient` offers the following functionality using Hudi's | filterExists() | Filter out already existing records from the provided RDD[HoodieRecord]. Useful for de-duplication | | checkExists(keys) | Check if the provided keys exist in a Hudi table | +### Read optimized query + +For read optimized queries, either Hive SerDe can be used by turning off convertMetastoreParquet as described above or Spark's built in support can be leveraged. +If using spark's built in support, additionally a path filter needs to be pushed into sparkContext as described earlier. ## Presto -Presto is a popular query engine, providing interactive query performance. Presto currently supports only read optimized queries on Hudi tables. -This requires the `hudi-presto-bundle` jar to be placed into `/plugin/hive-hadoop2/`, across the installation. +Presto is a popular query engine, providing interactive query performance. Presto currently supports snapshot queries on +COPY_On_WRITE and read optimized queries on MERGE_ON_READ Hudi tables. This requires the `hudi-presto-bundle` jar Review comment: will fix ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services