From solr-user-return-139243-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Tue Feb 20 16:28:18 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 880AF180654 for ; Tue, 20 Feb 2018 16:28:17 +0100 (CET) Received: (qmail 72437 invoked by uid 500); 20 Feb 2018 15:28:15 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 72420 invoked by uid 99); 20 Feb 2018 15:28:14 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Feb 2018 15:28:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 19BCC1A09D9 for ; Tue, 20 Feb 2018 15:28:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id BvfVY38KYWvH for ; Tue, 20 Feb 2018 15:28:13 +0000 (UTC) Received: from mail-vk0-f50.google.com (mail-vk0-f50.google.com [209.85.213.50]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 4730B5F58B for ; Tue, 20 Feb 2018 15:28:12 +0000 (UTC) Received: by mail-vk0-f50.google.com with SMTP id b132so7833237vke.9 for ; Tue, 20 Feb 2018 07:28:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=ERMrMxLQlE1pFSgkhaEPD6KM3/p+SnKvVX3MDkGl8Dk=; b=szh+OxszWz319vxY7PZgMYl0LVqymtcoDJhq5+5I7qEhmnpzAeeDwWmvCaVGZswvg6 dgI3HMnVa7u4nPHQlSA8Le/ed4Oj5Qmx9ykmd3Wma3pM0CnQ5Q63XgI9osXKJwQbX/q6 5iZpNu5r0KD4cUcGDnVVpdtzDhdMudltUUZdsMdrBQ0IpkMappzcwFl1LV3jYyhgp8RB NwGWHciwsIHm/7mMUXo1bTejoWsZrsw4TA6NTNEBw5NP1lIMm8CQ+lh7Wsid8ykCnnn5 UvHk7bSM9C0ifM794IqBcLdyNFTgKELepgthnDZaTfUJaFr/3I2GTGBoFhcW6yFn2Q7k IZ8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=ERMrMxLQlE1pFSgkhaEPD6KM3/p+SnKvVX3MDkGl8Dk=; b=oSo2w2RiudNOqByvSQc/q0dKloERLfjUishvtyraoayB5YxEW9b9t5NFmNgooCtpsF zP+3W559Kg6euQwN+fE2pF+/Ac20dJ0SLd8W6Io1ZXPdo1jD0AAzMIxxFblT+M/o+gJN 6XFrYHurnnCGu4MSgUMfsmEX93yjceUv0jMUD+Yuwk8JN7sOCwqB/DeXFyelcsWYND3C VPbbpIm75q4PqcOXcgCvnmrc+uTxJXeVtoKREHLNp4UtNxfOXTw23vMUxUf3t7wHKBMr 9yGKbwo9EJ0lp5jm60f0WtSKeoBRJFH7lualf+4iZO18n3+R8x+f7gCzOeHvMS0feJ2V wSYw== X-Gm-Message-State: APf1xPAeIbeDET6HpYpnEJUQMi+nIQGwOe4p7vtdxEmZNf+wuOpFTYoS 6f7PPYho9reJF1bPjDdMl8qAoU0931PZvNH1LfbTcQ== X-Google-Smtp-Source: AH8x226S513EZmk2+V2SalIBq0ahs5ljEga5Foy4PJCQz8beVoy1oNWeYME97u7/EOZyqt19IU6YqK1+NAw2viGn5Bw= X-Received: by 10.31.120.132 with SMTP id t126mr3944vkc.172.1519140491106; Tue, 20 Feb 2018 07:28:11 -0800 (PST) MIME-Version: 1.0 Received: by 10.176.69.142 with HTTP; Tue, 20 Feb 2018 07:27:50 -0800 (PST) From: Roman Chyla Date: Tue, 20 Feb 2018 10:27:50 -0500 Message-ID: Subject: storing large text fields in a database? (instead of inside index) To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary="94eb2c14b4fe561fe80565a6753d" --94eb2c14b4fe561fe80565a6753d Content-Type: text/plain; charset="UTF-8" Hello, We have a use case of a very large index (slave-master; for unrelated reasons the search cannot work in the cloud mode) - one of the fields is a very large text, stored mostly for highlighting. To cut down the index size (for purposes of replication/scaling) I thought I could try to save it in a database - and not in the index. Lucene has codecs - one of the methods is for 'stored field', so that seems likes a natural path for me. However, I'd expect somebody else before had a similar problem. I googled and couldn't find any solutions. Using the codecs seems really good thing for this particular problem, am I missing something? Is there a better way to cut down on index size? (besides solr cloud/sharding, compression) Thank you, Roman --94eb2c14b4fe561fe80565a6753d--