Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1E4FE200D51 for ; Thu, 23 Nov 2017 02:14:46 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 1C7B2160C0F; Thu, 23 Nov 2017 01:14:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3B131160BFD for ; Thu, 23 Nov 2017 02:14:45 +0100 (CET) Received: (qmail 87783 invoked by uid 500); 23 Nov 2017 01:14:43 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 87769 invoked by uid 99); 23 Nov 2017 01:14:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Nov 2017 01:14:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 5BDCDC16E3 for ; Thu, 23 Nov 2017 01:14:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.079 X-Spam-Level: X-Spam-Status: No, score=0.079 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, KB_WAM_FROM_NAME_SINGLEWORD=0.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 8NSPrpE9Um52 for ; Thu, 23 Nov 2017 01:14:41 +0000 (UTC) Received: from mail-lf0-f42.google.com (mail-lf0-f42.google.com [209.85.215.42]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id EE97A5FC55 for ; Thu, 23 Nov 2017 01:14:40 +0000 (UTC) Received: by mail-lf0-f42.google.com with SMTP id i14so20256265lfc.1 for ; Wed, 22 Nov 2017 17:14:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=YKfQr+O/h9IfdnJIyrl7yuf6Ju/SWMNF+rcSGoxib0E=; b=XGw63u9heeIopIE9BHeZpJ1kButdxqo8FmfCihUhxuVJCrO/eCzNoAeZccGFXW51oe VBVCcHaUn8iRxX0eNErPsLhjvVfvjwMSUUw4G4uMzizj+91k00Ux49ByPZdNDhJ+CjPb VXmCF0QtYCh2hrUB38H78bQS723kCs5G6uyyjzv/ZqSWzD0MobUwNF9//aL7YTETvnjr pzDK3GM9jt2MKFZN5dmxd2nHZko2CMYyayvUYB69oOhoB9+RIhD9TwpG/SOhigfZUCDh kTRbJNLeeb5gS7TZVpDfNtxsDp3lXowR0j4Fh79tYhiynPMtFLKdBIocKBXjD9FR93oV sZ7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=YKfQr+O/h9IfdnJIyrl7yuf6Ju/SWMNF+rcSGoxib0E=; b=LP3WktOAd11wNAG7xa3vpe3JAsLvlpw6Dwd9FvDN6TiDau9eMue057UwbJy28UrXiP x/0ROnSCjanqNEViD1JwwH6wNofWXOc9XmJTRKx4p48W7bFEGvWWxwCWt0Jgbz6s2GfK geX5bwSkQg8JwBi39b4w5Lt/TaCds5PFmEOzz/fkj6iLxZSKAkHInU+Z5ce9j2pC0/8F xkyua8OZdiF6ATwR0eeFormkFGdd7eTkrSExFmI5LNmOd6nBcw7S23cSg3GGMlfrSjO4 Lk7GHDJ40h29ifu9LB+6JzFX4JZrZnZmvgT43hmJTy6Oy1xcy5hS4oGLobg3xSfRSVd5 rFrg== X-Gm-Message-State: AJaThX5GjNjPt3v8gPy6JSGfDDYZqBCDtZMfah/IG+SeaJ3Tx9OVX47i t4lHKZRzPNPLXNAANI2QoU7rEfOLR4CJNPoEYRr6Ep2R X-Google-Smtp-Source: AGs4zMYPGsvNJaOnI4RZFJfNdRDHWAAuJfWD0g49w4i9suIfV7FNCFeaKiR/Gq34yRPIWcNyqGmfo6IMSOVDWtTPelI= X-Received: by 10.25.199.23 with SMTP id x23mr6416093lff.199.1511399674284; Wed, 22 Nov 2017 17:14:34 -0800 (PST) MIME-Version: 1.0 Received: by 10.25.56.81 with HTTP; Wed, 22 Nov 2017 17:13:53 -0800 (PST) In-Reply-To: <83c2e863-fdb3-274e-4e71-726bfed2058a@gmx.net> References: <83c2e863-fdb3-274e-4e71-726bfed2058a@gmx.net> From: Erick Erickson Date: Wed, 22 Nov 2017 17:13:53 -0800 Message-ID: Subject: Re: Solr on HDFS vs local storage - Benchmarking To: solr-user Content-Type: text/plain; charset="UTF-8" archived-at: Thu, 23 Nov 2017 01:14:46 -0000 bq: We also had an HDFS setup already so it looked like a good option to not loos data. Earlier we had a few cases where we lost the machines so HDFS looked safer for that. right, that's one of the places where using HDFS to back Solr makes a lot of sense. The other approach is to just have replicas for each shard distributed across different physical machines. But whatever works is fine. And there are a bunch of parameters you can tune both on HDFS and for local file systems so "it's more an art than a science". bq: Frequent adds with commits, which is likely not good in general anyway, does look quite a bit slower then local storage so far. I think you can go a long way towards fixing this by doing some autowarming. I wouldn't want to open a new searcher every second and do much autowarming over HDFS, but if you can stand less frequent commits (say every minute?) you might be able to smooth out the performance.... Best, Erick On Wed, Nov 22, 2017 at 11:31 AM, Hendrik Haddorp wrote: > We actually use no auto warming. Our collections are pretty small and the > query performance is not really a problem so far. We are using lots of > collections and most Solr caches seem to be per core and not global so we > also have a problem with caching. I have to test the HDFS cache some more as > that should work cross collections. > > We also had an HDFS setup already so it looked like a good option to not > loos data. Earlier we had a few cases where we lost the machines so HDFS > looked safer for that. > > I would expect that the HDFS performance is also quite good if you have lots > of document adds and not so frequent commits. Frequent adds with commits, > which is likely not good in general anyway, does look quite a bit slower > then local storage so far. As we didn't see that in our earlier tests, which > were more, query focused, I said it large depends on what you are doing. > > Hendrik > > On 22.11.2017 18:41, Erick Erickson wrote: >> >> In my experience, for relatively static indexes the performance is >> roughly similar. Once the data is read from whatever data source it's >> in memory, where the data came from is (largely) secondary in >> importance. >> >> In cases where there's a lot of I/O I expect HDFS to be slower, this >> fits Hendrik's observation: "We now had a patter with lots of small >> updates and commits and that seems to be quite a bit slower". He's >> merging segments and (presumably) autowarming frequently, implying >> lots of I/O and HDFS adds an extra layer. >> >> Personally I'd use whichever is most convenient and see if the >> performance was "good enough". I wouldn't recommend _installing_ HDFS >> just to use it with Solr, why add another complication? If you need >> the redundancy add replicas. If you already have the HDFS >> infrastructure in place and using HDFS is easier than local storage, >> feel free.... >> >> Best, >> Erick >> >> >> On Wed, Nov 22, 2017 at 8:06 AM, Greenhorn Techie >> wrote: >>> >>> Hendrik, >>> >>> Thanks for your response. >>> >>> Regarding "But this seems to greatly depend on how your setup looks like >>> and what actions you perform." May I know what are the factors influence >>> and what considerations are to be taken in relation to this? >>> >>> Thanks >>> >>> On Wed, 22 Nov 2017 at 14:16 Hendrik Haddorp >>> wrote: >>> >>>> We did some testing and the performance was strangely even better with >>>> HDFS then the with the local file system. But this seems to greatly >>>> depend on how your setup looks like and what actions you perform. We now >>>> had a patter with lots of small updates and commits and that seems to be >>>> quite a bit slower. We are about to do performance testing on that now. >>>> >>>> The reason we switched to HDFS was largely connected to us using Docker >>>> and Marathon/Mesos. With HDFS the data is in a shared file system and >>>> thus it is possible to move the replica to a different instance on a a >>>> different host. >>>> >>>> regards, >>>> Hendrik >>>> >>>> On 22.11.2017 14:59, Greenhorn Techie wrote: >>>>> >>>>> Hi, >>>>> >>>>> Good Afternoon!! >>>>> >>>>> While the discussion around issues related to "Solr on HDFS" is live, I >>>>> would like to understand if anyone has done any performance >>>>> benchmarking >>>>> for both Solr indexing and search between HDFS vs local file system. >>>>> >>>>> Also, from experience, what would the community folks suggest? Solr on >>>>> local file system or Solr on HDFS? Has anyone done a comparative study >>>>> of >>>>> these choices? >>>>> >>>>> Thanks >>>>> >>>> >