Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9133D200C78 for ; Thu, 18 May 2017 08:36:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8FCB5160BC4; Thu, 18 May 2017 06:36:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D13AB160BB0 for ; Thu, 18 May 2017 08:36:10 +0200 (CEST) Received: (qmail 66871 invoked by uid 500); 18 May 2017 06:36:09 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 66857 invoked by uid 99); 18 May 2017 06:36:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 May 2017 06:36:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id E4AD7C061C for ; Thu, 18 May 2017 06:36:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id JPYvEvUSes2e for ; Thu, 18 May 2017 06:36:06 +0000 (UTC) Received: from mail-yb0-f179.google.com (mail-yb0-f179.google.com [209.85.213.179]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 7B5F35FBDF for ; Thu, 18 May 2017 06:36:06 +0000 (UTC) Received: by mail-yb0-f179.google.com with SMTP id 132so8333260ybq.1 for ; Wed, 17 May 2017 23:36:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=/vWCYywNrpD9/DatiqJSkm7KoThW86cieqpAmv76/DE=; b=g4JSror2yb9sYHGCKOd0pD89kXI75cCbxkLKpDVBD3XOhEktyLKlGyUFl2mMlyzkF3 rlfEeauKatDfz/NPGKQU/x1cOyUNw8wc0pDzJr6nRWWj/MPl12fOSykXtlJwGUOWPrmO bZHUxxKPOXEXz7kcZzdtoXg8JeA2EZfwW7x20jgoTa/4gepGlPW1h7LzadA0jY9XP0Zt 9CJj+MASWP5Osxh29yq30HpteBauADt2+pMQ7z5G9EeDrw4jzUoj6JpA69N1gSQKMJ5z /oGMMiTcZc69j9UFefUW3oWN8EkrMsmQdUWZZSZ6eZl4lvs+e+EH+/oIOx8Cp5qVMWBH e9mA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=/vWCYywNrpD9/DatiqJSkm7KoThW86cieqpAmv76/DE=; b=LCyWTyRn8Rm3ywgJiwckptk3gXuwCztfyKnOy2MmIY4NvCZPckDxdv5QTbDrti3g2N gfrzFnlKq3PQsbtSVfqwhUbp4XZ+MofWoXy8WdO2t8ff6XiN2J0NUMVe62dbE5ddJSaX XnvntlFJGivPNz2f3KK6MCJlloe89rcRGNFvZPqEGa2X0OqwKvgurPHKGwQnQV9hkg1F /gf7w6ivkuTgdOcm8NyacnbRjNeGh3XVBQw4l9rKfar4wLvQLC9CtwI44xnVY59UJaCd RYY91XEexDms0wDknwW8P1tQ2TzPw2T/X7k8DHzz1+PSzdfeX9DOs/AJhnC9i2RGwzk9 cCUA== X-Gm-Message-State: AODbwcBOWtb45LAl220f1WeYEp7YmBMG/pIEra+Ct8prBdtavoYRGZmy 5G9k/lGXf2kgXzBX+dks74liS1cJfw== X-Received: by 10.37.177.166 with SMTP id h38mr2113909ybj.15.1495089365874; Wed, 17 May 2017 23:36:05 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Adrien Grand Date: Thu, 18 May 2017 06:35:55 +0000 Message-ID: Subject: Re: Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1 To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary="f403045eb3da8f8046054fc69e51" archived-at: Thu, 18 May 2017 06:36:11 -0000 --f403045eb3da8f8046054fc69e51 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Is upgrading to Lucene 6 and using points rather than terms an option? Points typically have lower memory usage (see GeoPoint which is based on terms vs LatLonPoint which is based on points at http://people.apache.org/~mikemccand/geobench.html#reader-heap). Le jeu. 18 mai 2017 =C3=A0 02:35, Tom Hirschfeld = a =C3=A9crit : > Hey! > > I am working on a lucene based service for reverse geocoding. We have a > large index with lots of unique terms (550 million) and it appears that > we're running into issue with memory on our leaf servers as the term > dictionary for the entire index is being loaded into heap space. If we > allocate > 65g heap space, our queries return relatively quickly (10s -10= 0s > of ms), but if we drop below ~65g heap space on the leaf nodes, query tim= e > drops dramatically, quickly hitting 20+ seconds (our test harness drops a= t > 20s). > > I did some research, and found in past versions of lucene, one could spli= t > the loading of the terms dictionary using the 'termInfosIndexDivisor' > option in the directoryReader class. That option was deprecated in lucene > 5.0.0 > > in > favor of using codecs to achieve similar functionality. Looking at the > available experimental codecs. I see the BlockTreeTermsWriter > < > https://lucene.apache.org/core/5_3_1/core/org/apache/lucene/codecs/blockt= ree/BlockTreeTermsWriter.html#BlockTreeTermsWriter(org.apache.lucene.index.= SegmentWriteState > , > org.apache.lucene.codecs.PostingsWriterBase, int, int)> that seems like i= t > could be used for a similar purpose, breaking down the term dictionary so > that we don't load the whole thing into heap space. > > Has anyone run into this problem before and found an effective solution? > Does changing the codec used seem appropriate for this issue? If so, how = do > I got about loading an alternative codec and configuring it to my needs? > I'm having trouble finding docs/examples of how this is used in the real > world so even if you point me to a repo or docs somewhere I'd appreciate > it. > Thanks! > > Best, > Tom Hirschfeld > --f403045eb3da8f8046054fc69e51--