From solr-user-return-139244-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Tue Feb 20 16:36:17 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id E4E75180654 for ; Tue, 20 Feb 2018 16:36:16 +0100 (CET) Received: (qmail 99098 invoked by uid 500); 20 Feb 2018 15:36:14 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 99086 invoked by uid 99); 20 Feb 2018 15:36:14 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Feb 2018 15:36:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B059E18028B for ; Tue, 20 Feb 2018 15:36:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.898 X-Spam-Level: * X-Spam-Status: No, score=1.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id qSRNdmIj61mK for ; Tue, 20 Feb 2018 15:36:12 +0000 (UTC) Received: from mail-wr0-f170.google.com (mail-wr0-f170.google.com [209.85.128.170]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id DCB0F5F121 for ; Tue, 20 Feb 2018 15:36:11 +0000 (UTC) Received: by mail-wr0-f170.google.com with SMTP id u15so15612124wrg.3 for ; Tue, 20 Feb 2018 07:36:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=kUkNSnK6J6ny3gVearMgoWsGMxIYksw+7OI18zomiRI=; b=Jnpha9VnHFpGQoqwjEUe7ZwmkulmsEhHL5byqVbSq26MFATVoV/BTDx7Unypa8eqnf gU8I64zNbJ6Gt5taD5kwX7P+jW+W2qryJMYyJECHNpQXD0O6F8RptxhbRE1PN/ub31oO cApNunAURQG7ArTmKukPP1IXH3T56hdVsqm0jeKcQqaduzKRaEzLpoSWPGICVcpRfEi1 //3HXfgaTD3+kCCkdFLODEVNBvADO9EFkRlZV8XcLWPzIwOX7cz2Pe2y03xnU1BdrWWb qZsx84cg8LkTmxJ+LbxWLVAE70JpBV+aKz6chphr0zSJJ35XKRyanB0iwiJPbJhCRXH3 /48g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=kUkNSnK6J6ny3gVearMgoWsGMxIYksw+7OI18zomiRI=; b=AlqQOWsc0EmqDUXuFl9kf9PIEXC79Z7LFEm2FIP8arhq/o53zBmrMJjpIM/HwL66Bi 4g7X7qgWt2l/4aI8GtlBy8p6zak4AOE2mk69S1n3KZeFwLX+bbGrTN6sEVvuL+Gxp7Qp lpbmGYGDhO+iZ+iGZm2c0W8HINOuqeKrh9GaG4nKd1bXCjf4thucRGjV/owLLt6qv0aa u8iWV3PgzLkewbdbo8WSIlYvWIh2/Ikm5guzmLwUzH299+a6Bspz1esm35kIazCg6sMO SyR33PLf9Y884t2wYvqLAKIbsdX9xDGcLnNdIZkqpD3LIW5eGWtsduJCQbv1SCvKjMdN Ut5g== X-Gm-Message-State: APf1xPAcvZQlft+2IQwdq1hvb99dsiyzN7J0uw62W+/nbUvuhnoVvy2K YMdpSLRJdYhlZ3XutBoIsigg1iw85I+78XAcjsQ= X-Google-Smtp-Source: AH8x226Jydd7SZgF6fkaKJ4ksRkFjtIgmCE3/pLqNPzrlZQeFe5mVtaI/Z72GHYFc3J02y038DKFzprEYh/xcERj3+k= X-Received: by 10.28.71.77 with SMTP id u74mr530436wma.150.1519140971315; Tue, 20 Feb 2018 07:36:11 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.132.145 with HTTP; Tue, 20 Feb 2018 07:36:10 -0800 (PST) In-Reply-To: References: From: David Hastings Date: Tue, 20 Feb 2018 10:36:10 -0500 Message-ID: Subject: Re: storing large text fields in a database? (instead of inside index) To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary="94eb2c071dcaf5861b0565a69101" --94eb2c071dcaf5861b0565a69101 Content-Type: text/plain; charset="UTF-8" Really depends on what you consider too large, and why the size is a big issue, since most replication will go at about 100mg/second give or take, and replicating a 300GB index is only an hour or two. What i do for this purpose is store my text in a separate index altogether, and call on that core for highlighting. So for my use case, the primary index with no stored text is around 300GB and replicates as needed, and the full text indexes with stored text totals around 500GB and are replicating non stop. All searching goes against the primary index, and for highlighting i call on the full text indexes that have a stupid simple schema. This has worked for me pretty well at least. On Tue, Feb 20, 2018 at 10:27 AM, Roman Chyla wrote: > Hello, > > We have a use case of a very large index (slave-master; for unrelated > reasons the search cannot work in the cloud mode) - one of the fields is a > very large text, stored mostly for highlighting. To cut down the index size > (for purposes of replication/scaling) I thought I could try to save it in a > database - and not in the index. > > Lucene has codecs - one of the methods is for 'stored field', so that seems > likes a natural path for me. > > However, I'd expect somebody else before had a similar problem. I googled > and couldn't find any solutions. Using the codecs seems really good thing > for this particular problem, am I missing something? Is there a better way > to cut down on index size? (besides solr cloud/sharding, compression) > > Thank you, > > Roman > --94eb2c071dcaf5861b0565a69101--