From java-user-return-64508-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Wed Jul 3 20:14:57 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id E40C3180181 for ; Wed, 3 Jul 2019 22:14:56 +0200 (CEST) Received: (qmail 64365 invoked by uid 500); 3 Jul 2019 20:14:54 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 64349 invoked by uid 99); 3 Jul 2019 20:14:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jul 2019 20:14:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 5ABCE180D89 for ; Wed, 3 Jul 2019 20:14:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.81 X-Spam-Level: * X-Spam-Status: No, score=1.81 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_PDS_NO_HELO_DNS=0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Mn30LQTUcDZB for ; Wed, 3 Jul 2019 20:14:52 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.222.176; helo=mail-qk1-f176.google.com; envelope-from=msfroh@gmail.com; receiver= Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id D72D6BC774 for ; Wed, 3 Jul 2019 20:14:51 +0000 (UTC) Received: by mail-qk1-f176.google.com with SMTP id g18so3891713qkl.3 for ; Wed, 03 Jul 2019 13:14:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=O+I515MjGIVkak12yls12w/WgQVeX054ksBXh0qSQjE=; b=gpC1uzrbeWx8GmPWZlINrraF4TF2pJeHAovwL+ctLCYrsyMpmYSgb1naM6hFQZQktG kevrGqxPICaqwwvuGErFXALHQ7STCUiiE/zavQUKtz4DtO63zZcQcFoVs6WrMVn8QdWW fweEIYEHzHNNjiD9uZUgHLFnu0xzuPjPCurKVvMrlH1XJJ8R/3jn8fI3UfPkzSVdGXKH IUERQWg0ob8Y9idu4HtHGLDiOPz1+IGbgoKEPgnnSwekEB11Ma1WZinkBXFs6MBjb/mr Vd6O8PfcyPQhAGE89ieHO3fuWwONZTlTVhzFVakA1Eji5pyEeC5nmnc4UOLydMnqA0it x5sA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=O+I515MjGIVkak12yls12w/WgQVeX054ksBXh0qSQjE=; b=EXNi/xgpzMgOo3fGhbohV9Xr71TYsoX57M9n8WI0eCldNgWdXsZqNxi+h5zUzu8pMR fAjeoEbMlGimch/iNXsYhaxpxpVP27FkNysEivhQ9UjO6J6XXM2wmTM9l/oTLto/F3Rh OHXmsdPesBriuBPlT2fLYnJT1/EDCB2deynhK0kJsvRnk6tallrlFYoe8GPNdIu0AtqL WD2DOOeQUunJkxTOxXpRcbIBP7+FFUq8VgPhWXtIEQ2YMVDF03+zE5uBvKxE8tKqiS4p RbpZ5TeIO3tJXCF1105DPAfXL4Emd/ioi82sLt86z5NRCTAwbx/25BazZI5StSZzs9e/ TvBw== X-Gm-Message-State: APjAAAVcSDSaMjEcKwbpZRfTpFj1SSQGScDauNnACFuyr0DqXA46EV/4 uaO/q9eOHNF/iTifuVKXqEtNQjy8LJutXm0620OQGw== X-Google-Smtp-Source: APXvYqwvSjlzEdqsOPQo1OcEeEEcrIrVf25rtTZN99BA6lANV9/4D2Ko4kuTBRa8e2M8a7Y8DIL670hoF6wB/c7iVmU= X-Received: by 2002:a37:6984:: with SMTP id e126mr31835636qkc.487.1562184891275; Wed, 03 Jul 2019 13:14:51 -0700 (PDT) MIME-Version: 1.0 From: Michael Froh Date: Wed, 3 Jul 2019 13:14:40 -0700 Message-ID: Subject: Lucene Index Cloud Replication To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary="000000000000849fc4058ccc8327" --000000000000849fc4058ccc8327 Content-Type: text/plain; charset="UTF-8" Hi there, I was talking with Varun at Berlin Buzzwords a couple of weeks ago about storing and retrieving Lucene indexes in S3, and realized that "uploading a Lucene directory to the cloud and downloading it on other machines" is a pretty common problem and one that's surprisingly easy to do poorly. In my current job, I'm on my third team that needed to do this. In my experience, there are three main pieces that need to be implemented: 1. Uploading/downloading individual files (i.e. the blob store), which can be eventually consistent if you write once. 2. Describing the metadata for a specific commit point (basically what the Replicator module does with the "Revision" class). In particular, we want a downloader to reliably be able to know if they already have specific files (and don't need to download them again). 3. Sharing metadata with some degree of consistency, so that multiple writers don't clobber each other's metadata, and so readers can discover the metadata for the latest commit/revision and trust that they'll (eventually) be able to download the relevant files. I'd like to share what I've got for 1 and 3, based on S3 and DynamoDB, but I'd like to do it with interfaces that lend themselves to other implementations for blob and metadata storage. Is it worth opening a Jira issue for this? Is this something that would benefit the Lucene community? Thanks, Michael Froh --000000000000849fc4058ccc8327--