Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B9DA1C745 for ; Sat, 6 Jul 2013 20:00:13 +0000 (UTC) Received: (qmail 36189 invoked by uid 500); 6 Jul 2013 20:00:11 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 36149 invoked by uid 500); 6 Jul 2013 20:00:11 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 36141 invoked by uid 99); 6 Jul 2013 20:00:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Jul 2013 20:00:11 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of uwe@thetaphi.de designates 188.138.97.18 as permitted sender) Received: from [188.138.97.18] (HELO mail.sd-datasolutions.de) (188.138.97.18) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Jul 2013 20:00:07 +0000 Received: from VEGA (port-92-196-26-178.dynamic.qsc.de [92.196.26.178]) by mail.sd-datasolutions.de (Postfix) with ESMTPSA id A81C014AA06F for ; Sat, 6 Jul 2013 19:59:45 +0000 (UTC) From: "Uwe Schindler" To: References: <51D1F76F.9020702@gmail.com> In-Reply-To: Subject: RE: In memory index (current status in Lucene) Date: Sat, 6 Jul 2013 21:59:45 +0200 Message-ID: <022901ce7a83$5d16e8d0$1744ba70$@thetaphi.de> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQG1NFY5Nwy1ZWLwUgsJ8YiK20yToQI2COt3AUx1OuMClX7LBQJI367RAcyeFZyZOTH7cA== Content-Language: de X-Virus-Checked: Checked by ClamAV on apache.org You mean tmpfs - not RAM disk. Tmpfs is cool, as it plays wonderful = winth mmap (mmap just maps the RAM used by the tmpfs into the user's = address space). ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: Ramkumar R. Aiyengar [mailto:andyetitmoves@gmail.com] > Sent: Thursday, July 04, 2013 10:14 PM > To: java-user@lucene.apache.org > Subject: Re: In memory index (current status in Lucene) >=20 > Have you tried using MMapDirectory over a RAM disk (assuming you are = on > Linux)? You can avoid writing to disk (and thus the other ways to get = to it > persistently as Steven mentions), but still MMap it. > On 1 Jul 2013 22:41, "Lance Norskog" wrote: >=20 > > My current open source project is a Directory that is just like > > RAMDirectory, but everything is memory-mapped. The idea is it = creates > > a disk file, opens it, and immediately deletes the file. The file > > still exists until the IndexReader/Writer/Searcher closes it. But, = it > > cannot be found from the file system. This is just like a > > RAMDirectory, but without memory limitations. > > > > It's proving to be harder than it looked. > > > > The application is to store encrypted indexes in memory, with the > > decrypted contents in this non-findable format. I'm in medical > > document analysis now, and we can't store anything on disk in the = clear. > > > > Lance > > > > On 07/01/2013 07:07 AM, Emmanuel Espina wrote: > > > >> Hi Erick! Nice to hear from you again! From time to time my = interest > >> in these "Lucene things" returns and I do some experiments :p > >> > >> Just to add to this conversation, I found an interesting link to > >> Mike's blog about memory resident indexes (using another virtual > >> machine) http://blog.mikemccandless.**com/2012/07/lucene-index-in- > ** > >> ram-with-azuls-zing- > jvm.html >> ucene-index-in-ram-with-azuls-zing-jvm.html> > >> and also (which is not exactly what I asked but seems related) = there > >> is a Google Summer of Code project to build a memory residen term > >> resident: > >> http://www.google-melange.com/**gsoc/project/google/gsoc2013/** > >> billybob/42001 melange.com/gsoc/project/google/gsoc > >> 2013/billybob/42001> > >> > >> Thanks > >> Emmanuel > >> > >> > >> 2013/7/1 Erick Erickson : > >> > >>> Hey Emma! It's been a while.... > >>> > >>> Building on what Steven said, here's Uwe's blog on MMapDirectory = and > >>> Lucene: > >>> http://blog.thetaphi.de/2012/**07/use-lucenes-mmapdirectory-** > >>> on-64bit.html mmapdirect > >>> ory-on-64bit.html> > >>> > >>> I've always considered RAMDirectory for rather restricted = use-cases. > >>> I.e. if I know without doubt that the index is both relatively > >>> static and bounded. The other use I've seen is to use it to index > >>> single documents on-the-fly for some reason (say complex = processing > >>> of a single result) then throw it out afterwards. > >>> > >>> How are things going? > >>> > >>> Erick > >>> > >>> > >>> > >>> On Fri, Jun 28, 2013 at 5:36 PM, Steven Schlansker > >>> >>> >wrote: > >>> > >>> On Jun 28, 2013, at 2:29 PM, Emmanuel Espina > >>> > >>>> wrote: > >>>> > >>>> I'm building a distributed index (mostly as a reasearch project > >>>> for > >>>>> school) and I'm evaluating indexing the entire collection in > >>>>> memory (like google, facebook and others have done years ago). = The > >>>>> obvious reason for this is performance considering that the > >>>>> replication will give me a reasonably good durability of the = data > >>>>> (despite being in volatile memory). > >>>>> > >>>>> What is the current status of Lucene for this kind of indexes? > >>>>> RAMDirectory in it's documentation has a scary warning that says > >>>>> that "is not intended to work with huge indexes", and that = sounds > >>>>> more like it is an implementation for testing rather than > >>>>> something for production. > >>>>> > >>>>> Of course there is no real context for this question, because it > >>>>> is a reasearch topic. Testing it's limits would be the closest = to > >>>>> a context I have :p > >>>>> > >>>> You could consider MMapDirectory, which will end up putting the > >>>> active portions of the index in memory (via the filesystem buffer > >>>> cache). > >>>> > >>>> The benefit is that you don't completely destroy the Java heap > >>>> (RAMDirectory causes immense GC pressure if you are not careful) > >>>> and you don't have to commit all of your ram to index usage all = the > >>>> time. > >>>> > >>>> The downside is that if your working set exceeds the amount of = RAM > >>>> available for buffer cache, you will get silent performance > >>>> degradation as you fall back to disk reads for the missing = blocks. > >>>> > >>>> Maybe this is OK for your use case, maybe not. > >>>> > >>>> > >>>> ------------------------------**------------------------------** > >>>> --------- > >>>> To unsubscribe, e-mail: > >>>> java-user-unsubscribe@lucene.**apache.org unsubscribe@luc > >>>> ene.apache.org> For additional commands, e-mail: > >>>> java-user-help@lucene.apache.**org help@lucene.apache.org > >>>> > > >>>> > >>>> > >>>> ------------------------------**------------------------------** > >> --------- > >> To unsubscribe, e-mail: > >> java-user-unsubscribe@lucene.**apache.org unsubscribe@lucen > >> e.apache.org> For additional commands, e-mail: > >> java-user-help@lucene.apache.**org help@lucene.apache.org> > >> > >> > > > > = ------------------------------**------------------------------**------ > > --- To unsubscribe, e-mail: > > java-user-unsubscribe@lucene.**apache.org unsubscribe@lucene > > .apache.org> For additional commands, e-mail: > > java-user-help@lucene.apache.**org help@lucene.apache.org> > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org