From user-return-111-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Thu Mar 21 14:53:48 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id AC2F518077A for ; Thu, 21 Mar 2019 15:53:47 +0100 (CET) Received: (qmail 50885 invoked by uid 500); 21 Mar 2019 14:53:46 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 50875 invoked by uid 99); 21 Mar 2019 14:53:46 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2019 14:53:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 41ECA180E38 for ; Thu, 21 Mar 2019 14:53:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.202 X-Spam-Level: X-Spam-Status: No, score=-0.202 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Mz1WDMwz1LBF for ; Thu, 21 Mar 2019 14:53:45 +0000 (UTC) Received: from mail-qk1-f178.google.com (mail-qk1-f178.google.com [209.85.222.178]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id AAA1A624A3 for ; Thu, 21 Mar 2019 14:46:30 +0000 (UTC) Received: by mail-qk1-f178.google.com with SMTP id n68so629773qka.1 for ; Thu, 21 Mar 2019 07:46:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:content-transfer-encoding:mime-version:subject:message-id:date :to; bh=vV1zyb1qmp7qJxFhKKq1k2oaj7q/GwNbVsUv6MXYneU=; b=eJWdlSfs+4RJpte+iegNvl6J/yGdFJtGgA/gvirMNhoZwy2BOQIJhvwlNDNJkA/C2x H9pwJ47cwcNeZpsYbAl8vlB8UrY9MdJz70HniyMKY6dReJ95wAdiSpZxgOm3TIojDcQS j7dqJgJwtldsA1O2Kn2xZeJ9gWWQKodHd13XR1fe/D1toDBeFZ4+IGj472IIP+hT9l5A WFIWMVLnMxoouW9sZftuxGvp2KLhQ/EHoTnmkb6DFfAWHqlnAxWBVYCFXE+zQwWEO060 oCbemX7A2oYVDb3dMW9l7yhI4Xgp9TpsWGs5QKCbVB6I4x2Oz6mpgZE9Y7vfxUP+dK3Q b0tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:message-id:date:to; bh=vV1zyb1qmp7qJxFhKKq1k2oaj7q/GwNbVsUv6MXYneU=; b=NaCy/1IeEKN86e6//thYgznyzWx9bFPu+01ljHLqJz92okJ50XVhQJrgwPAEbU+sO+ YtwDrg2u5mSWr4YWa17aTdBIRceopU0S7e8OooTRAQcc6nNNMSGr6KC+HVNtoLOJ0KYS 5DCJr9GOKzuFh3+toMoAXHrFVK1BdxOQueDwX4trgoe9f6jh00rDGA77JxMMzEyxcIZJ LFZwQU7IjOFWt8nf/ZnjB57D37QEsvoUK/Yf8QEtvUHeeZ2yUUZUmUnubnxyaGvS8ZEB mOS3znDJQCPNlHxiCa4yIlzG1Tt109+YkzvFg9cE1Tenuk6nfmkH4Ib++crg9rInfrc3 z/nw== X-Gm-Message-State: APjAAAXaRlZibwd084PtmC0TK5UXhmtd7d1F9vAJIu0xEpcvMK7lwg3C CBYZ9iV65n3k1ljMT978Hg1xGAom X-Google-Smtp-Source: APXvYqxOKWMwgxLNpmUEctFAQHdQM8inYTaJmBkjKgGWVqJxw9M0aEao5TqBluODxh8blr/mEku8fQ== X-Received: by 2002:ae9:f308:: with SMTP id p8mr2960734qkg.33.1553179584652; Thu, 21 Mar 2019 07:46:24 -0700 (PDT) Received: from [172.16.100.101] ([49.206.0.21]) by smtp.gmail.com with ESMTPSA id u18sm3122841qka.25.2019.03.21.07.46.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Mar 2019 07:46:24 -0700 (PDT) From: Nirmala S Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) Subject: Caching layer using arrow Message-Id: <4B1E28AD-FE70-485E-82A4-99447E3B1286@gmail.com> Date: Thu, 21 Mar 2019 20:16:15 +0530 To: user@arrow.apache.org X-Mailer: Apple Mail (2.3445.9.1) Hi, I am trying to build a caching layer using Arrow on top of ORC = files. The application will ask for a column(which can be of any data = type - fixed, variable length) of data from the cache, the cache needs = to check it=E2=80=99s metadata to see if the column is already present. = If yes, it can return the data to application. If not the data needs to = be fetched from ORC files, cached and then returned to application. The = application is multi-threaded and is based on C++. Application has a = read-only workload. =09 This being the case what is the best method to maintain the = metadata and the data in Arrow, is there any good practise ?=20 If cache size is smaller than the ORC file size, should I be = putting in a logic to swap the data using some algorithm like LRU or is = this already present in Arrow ? Thanks in advance Nirmala