Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8F00E10B99 for ; Wed, 2 Oct 2013 18:46:06 +0000 (UTC) Received: (qmail 27374 invoked by uid 500); 2 Oct 2013 18:45:58 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 27227 invoked by uid 500); 2 Oct 2013 18:45:55 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 27006 invoked by uid 99); 2 Oct 2013 18:45:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Oct 2013 18:45:49 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.192.182] (HELO mail-pd0-f182.google.com) (209.85.192.182) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Oct 2013 18:45:44 +0000 Received: by mail-pd0-f182.google.com with SMTP id r10so1283368pdi.13 for ; Wed, 02 Oct 2013 11:45:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id:references:to; bh=zoo18gXQ4wEtfco7hG9Xh+5NqsqvHVCJgj6yqe56hIw=; b=c7HBqapYZutA5kASsOYVIKwwOcLcoLDcYK+rXiPShYUjZzQdKjDSq+VKjgRzveLNZk pUvQRM9YiS6KAyUAZYhAYuj9rTih0/hxo4Pis+V4MBVGa9B35TSgxOODOiNJCykxjyz3 KeLpCD2d+LqJuKKOnDAgLi5s6L9vZEkhql8iL2YoN1/05FfeV0Zz7xX+5hRUFckiopDA xY8aQs/KT3wm5xmHVaKnkUzi+1Z6wavFz2uVmjoOGbkYXBi9MpDXZK5e2bVqly7lLW9e VZ2nT2j7FjNonjVS7Oxau/kbRxuZmEY9sCBZkJ7MjU4yNGov69iWvsc+gzw96IxiaSA6 QMAw== X-Gm-Message-State: ALoCoQkMvMLuC/iUKBo7lL86vV8Njnau8Wa4XcHw26UZnT3uMMS3Dsm4IbcRBw2smQwG3kM2ljRM X-Received: by 10.66.121.234 with SMTP id ln10mr4765021pab.20.1380739081560; Wed, 02 Oct 2013 11:38:01 -0700 (PDT) Received: from [172.20.12.26] (75-149-43-193-SFBA.hfc.comcastbusiness.net. [75.149.43.193]) by mx.google.com with ESMTPSA id gh2sm3373744pbc.40.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 02 Oct 2013 11:37:58 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: DocValues formats hold large byte[][]s even when using MMapDirectory From: Steven Schlansker In-Reply-To: Date: Wed, 2 Oct 2013 11:37:57 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.1510) X-Virus-Checked: Checked by ClamAV on apache.org On Oct 2, 2013, at 11:16 AM, Michael McCandless = wrote: > In Lucene 4.5 (coming out any day now) we've switched by default to a > "mostly on disk" impl for doc values. >=20 Awesome! Looking forward to that then. > Before that, you can use DiskDocValuesFormat instead. >=20 > But you'll need to re-index (or create a new index and use > IW.addIndexes) to cutover your current index to the DiskDVFormat. >=20 I see a few references scattered on the internet but it's not in my = Lucene jars. The one reference I saw to it indicated that every patch = release of Lucene will require a full reindex when using this, which is = a serious bummer. So I think I'll hold out for 4.5 and hope that that solves my problem. Thanks for the help! >=20 > On Wed, Oct 2, 2013 at 2:11 PM, Steven Schlansker = wrote: >> Hi, >>=20 >> I have a search application using Lucene 4.4.0 with various = BinaryDocValues and SortedSetDocValues. >> We use MMapDirectory to help keep the Java heap small / GC pause = times short and instead rely on the OS buffer cache to keep things fast, = which I gather is generally considered a "best practice" around here. >> As our index grows, I've noticed that we are getting GC pauses and = later OOM errors when reloading a new index due to gigabytes of = byte[][]s held by Lucene42DocValuesProducer, specifically the = PagedBytes.Reader.blocks from within = Lucene42DocValuesProducer.loadBinary >>=20 >> I would have expected DocValues fields to use mapped bytes instead of = copying into the Java heap much as the "main" index data is. Is this a = technical limitation, a "we haven't gotten there yet" feature request, = or something different entirely? >>=20 >> Thanks for helping my understanding, >> Steven >>=20 >>=20 >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >>=20 >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org >=20 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org