Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 25109 invoked from network); 15 Nov 2009 11:39:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Nov 2009 11:39:28 -0000 Received: (qmail 96332 invoked by uid 500); 15 Nov 2009 11:39:27 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 96235 invoked by uid 500); 15 Nov 2009 11:39:26 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 96227 invoked by uid 99); 15 Nov 2009 11:39:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 15 Nov 2009 11:39:26 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of earwin@gmail.com designates 209.85.218.222 as permitted sender) Received: from [209.85.218.222] (HELO mail-bw0-f222.google.com) (209.85.218.222) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 15 Nov 2009 11:39:23 +0000 Received: by bwz22 with SMTP id 22so4915713bwz.5 for ; Sun, 15 Nov 2009 03:39:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=8dRRv7fJfkdoJHgZ0v6f6gdtaKj8QEEaBj0jlpR0wrU=; b=Pxmzw8LATQPWWgmAPw8yVMg28qHN0+JwF+hGAM1w00JLY3Qu/8QmITWvc6HcG3OKqi J26+BBZ4xW5UQDUkcK/QKP69NrtpvNv69hdmbircrWZCiSBRriYFGaLoMmt36wCDHNiC 3ddtK0q9tZa7Zx3jzPTbyY8StIqtuGTtHetEI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=FbvqHXlAQJEeW6BNJ5zuf2Vvq58CzcZNuaY8ejUHmSal4gPmB5d4gYR0qGaTEMzob1 kzaFO+YQBhZkTTGFjCnI+DsCW70RAZNP6iBGYHS+hPaSlf5mYsHB48pfhFuftAROyjSN 7QBbN1XpEG7BB7+QP19B2cs7JouDySBiP3OkQ= MIME-Version: 1.0 Received: by 10.216.86.65 with SMTP id v43mr2036165wee.118.1258285141420; Sun, 15 Nov 2009 03:39:01 -0800 (PST) In-Reply-To: References: <8837fb770911141415r9537e29q2747557cb3ec5acd@mail.gmail.com> Date: Sun, 15 Nov 2009 14:39:01 +0300 Message-ID: <59b3eb370911150339o17ecf1bau16f32cead1aecf40@mail.gmail.com> Subject: Re: A new Lucene Directory available From: Earwin Burrfoot To: java-dev@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Terracotta guys "easy-clustered" Lucene a few years ago. I'm yet to see at least one person saying it worked for him allright. This new directory ain't gonna be faster than RAMDirectory, as syncs on a map doesn't matter, they are taken once per opened file -> once per reopen, which is not happening thousands of times a sec. Taking a glance at the code (svn trunk), it actually is much slower. I mean, compare IndexInput.readByte()s. A whole slew of code and method calls plus a ChunkCacheKey created per each byte read (violent GC rape, ring the police!) VS if, incr, array access for RAMDir. I wouldn't be too optimistic in doesn't-fit-in-memory case VS FSDirectory either. OS' paging/file caching skills are hard to match, plus OS file cache resides outside of Java heap, which (as reallife experience dictates) is immensely good for your GC pauses. Now to the networking part. Infinispan is based on JGroups. Last time I saw it, it exploded under a moderate load on 20 nodes. I believe the library is still good, properly configured and for lesser loads, but not for distributing Lucene index that is frequently updated and merged on each node of the cluster. Please excuse me if I'm overboard in places, and correct me if I am wrong. On Sun, Nov 15, 2009 at 07:33, Sanne Grinovero wrote: > Hi John, > I didn't run a long running reliable benchmark, so at the moment I > can't really speak of numbers. > Suggestions and help on performance testing are welcome: I guess it > will shine in some situations, not necessarily all, so really choosing > a correct ratio of concurrent writers/searches, number of nodes in the > cluster and resources per node will never be fair enough to compare > this Directory with others. > > On paper the premises are good: it's all in-memory, until it fits: it > will distribute data across nodes and overflow to disk is supported > (called passivation). A permanent store can be configured, so you > could set it to periodically flush incrementally to slower storages > like a database, a filesystem, a cloud storage service. This makes it > possible to avoid losing state even when all nodes are shut down. > A RAMDirectory is AFAIK not recommended as you could hit memory limits > and because it's basically a synchronized HashMap; Infinispan > implements ConcurrentHashMap and doesn't need synchronization. > Even if the data is replicated across nodes each node has it's own > local cache, so when caches are warm and all segments fit in memory it > should be, theoretically, the fastest Directory ever. The more it will > read from disk, the more it will behave similarly to a FSDirectory > with some buffers. > > As per Lucene's design, writes can happen only at one node at a time: > one IndexWriter can own the lock, but IndexReaders and Searchers are > not blocked, so when using this Directory it should behave exactly as > if you had multiple processes sharing a local NIOFSdirectory: > basically the situation is that you can't scale on writers, but you > can scale near-linearly with readers adding in more power from more > machines. > > Besides performance, the reasons to implement this was to be able to > easily add or remove processing power to a service (clouds), make it > easier to share indexes across nodes, and last but not least to remove > single points of failure: all data is distributed and there is no such > notion of Master: services will continue running fine when killing any > node. > > I hope this peeks your interest, sorry if I couldn't provide numbers. > > Regards, > Sanne > > On Sat, Nov 14, 2009 at 11:15 PM, John Wang wrote: >> HI Sanne: >> >> =C2=A0=C2=A0 =C2=A0Very interesting! >> >> =C2=A0=C2=A0 =C2=A0What kinda performance should we expect with this, co= mparing to regular >> FSDIrectory on local HD. >> Thanks >> -John >> >> On Sat, Nov 14, 2009 at 11:44 AM, Sanne Grinovero >> wrote: >>> >>> Hello all, >>> I'm a Lucene user and fan, I wanted to tell you that we just released >>> a first technology preview of a distributed in memory Directory for >>> Lucene. >>> >>> The release announcement: >>> >>> http://infinispan.blogspot.com/2009/11/second-release-candidate-for-400= .html >>> >>> From there you'll find links to the Wiki, to the sources, to the issue >>> tracker. A minimal demo is included with the sources. >>> >>> This was developed together with Google Summer of Code student Lukasz >>> Moren and much support from the Infinispan and Hibernate Search teams, >>> as we are storing the index segments on Infinispan and using it's >>> atomic distributed locks to implement a Lucene LockFactory. >>> >>> Initial idea was to contribute it directly to Lucene, but as >>> Infinispan is a LGPL dependency we had to distribute it with >>> Infinispan (as the other way around would have introduced some legal >>> issues); still we hope you appreciate the effort and are interested in >>> giving it a try. >>> All kind of feedback is welcome, especially on benchmarking >>> methodologies as I yet have to do some serious performance tests. >>> >>> Main code, build with Maven2: >>> svn co >>> http://anonsvn.jboss.org/repos/infinispan/tags/4.0.0.CR2/lucene-directo= ry/ >>> infinispan-directory >>> >>> Demo, see the Readme: >>> svn co >>> http://anonsvn.jboss.org/repos/infinispan/tags/4.0.0.CR2/demos/lucene-d= irectory/ >>> lucene-demo >>> >>> Best Regards, >>> Sanne >>> >>> -- >>> Sanne Grinovero >>> Sourcesense - making sense of Open =C2=A0Source: http://www.sourcesense= .com >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-dev-help@lucene.apache.org >>> >> >> > > > > -- > Sanne Grinovero > Sourcesense - making sense of Open =C2=A0Source: http://www.sourcesense.c= om > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > --=20 Kirill Zakharenko/=D0=9A=D0=B8=D1=80=D0=B8=D0=BB=D0=BB =D0=97=D0=B0=D1=85= =D0=B0=D1=80=D0=B5=D0=BD=D0=BA=D0=BE (earwin@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org