Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 46946 invoked from network); 4 Sep 2008 14:46:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Sep 2008 14:46:27 -0000 Received: (qmail 48015 invoked by uid 500); 4 Sep 2008 14:46:18 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 47840 invoked by uid 500); 4 Sep 2008 14:46:17 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 47826 invoked by uid 99); 4 Sep 2008 14:46:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Sep 2008 07:46:17 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jason.rutherglen@gmail.com designates 209.85.217.13 as permitted sender) Received: from [209.85.217.13] (HELO mail-gx0-f13.google.com) (209.85.217.13) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Sep 2008 14:45:16 +0000 Received: by gxk6 with SMTP id 6so6256656gxk.5 for ; Thu, 04 Sep 2008 07:44:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=b7vxDFNXRPKEreelUrv8OKINFARy1R3WYS8J4ThTe7w=; b=mgIPktmUMVyaXLLMlpfH8c1YMpBW7w4We1u8FPLuzGbE8C9koWjaMQiS8XdbWIdtON 51wA6toAbgJw9mUGT0ukB9e4YNoBQ0JqFE4s6Qn8iYs8ZRE94yqSqgMeRR2WEE9n8yNV 86NPMlCpZFFEOG3x2fF7V5Xr8cogZRg6gv65c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=tfD/+3F71uZ08Yq4XXlzOI5XE32EXbKQ78eKdIcBXI/3mnEDf4QhNEyOZKI6ldAsZe du4L659wdnJi5dJcPD/IDmufCYY+HcGcHK57kKe/y6xKSMiX44VOeSSl5oueTfGwhr2I U8zx898FiZjf1fIfziRyrhQMk1QAHcKHc8cIM= Received: by 10.150.58.5 with SMTP id g5mr14594547yba.27.1220539486890; Thu, 04 Sep 2008 07:44:46 -0700 (PDT) Received: by 10.151.118.7 with HTTP; Thu, 4 Sep 2008 07:44:46 -0700 (PDT) Message-ID: <85d3c3b60809040744j3b384f17t858d536905326117@mail.gmail.com> Date: Thu, 4 Sep 2008 10:44:46 -0400 From: "Jason Rutherglen" To: java-user@lucene.apache.org Subject: Re: Realtime Search for Social Networks Collaboration In-Reply-To: <1bcb7c7f0809040608m6224438bh28af3a8eca21827f@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <85d3c3b60809031220t206bdf42m3026d4fbb6f7d3dd@mail.gmail.com> <1bcb7c7f0809040608m6224438bh28af3a8eca21827f@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org Hi Cam, Thanks! It has not been easy, probably has taken 3 years or so to get this far. At first I thought the new reopen code would be the solution. I used it, but then needed to modify it to do a clone instead of reference the old deleted docs. Then as I iterated, realized that just using reopen on a ramdirectory would not be quite fast enough because of the merging. Then started using InstantiatedIndex which provides an in memory version of the document, without the overhead of merging during the transaction. There are other complexities as well. The basic code works if you are interested in trying it out. Take care, Jason On Thu, Sep 4, 2008 at 9:08 AM, Cam Bazz wrote: > Hello Jason, > I have been trying to do this for a long time on my own. keep up the good > work. > > What I tried was a document cache using apache collections. and before a > indexwrite/delete i would sync the cache with index. > > I am waiting for lucene 2.4 to proceed. (query by delete) > > Best. > > On Wed, Sep 3, 2008 at 10:20 PM, Jason Rutherglen < > jason.rutherglen@gmail.com> wrote: > >> Hello all, >> >> I don't mean this to sound like a solicitation. I've been working on >> realtime search and created some Lucene patches etc. I am wondering >> if there are social networks (or anyone else) out there who would be >> interested in collaborating with Apache on realtime search to get it >> to the point it can be used in production. It is a challenging >> problem that only Google has solved and made to scale. I've been >> working on the problem for a while and though a lot has been >> completed, there is still a lot more to do and collaboration amongst >> the most probable users (social networks) seems like a good thing to >> try to do at this point. I guess I'm saying it seems like a hard >> enough problem that perhaps it's best to work together on it rather >> than each company try to complete their own. However I could be >> wrong. >> >> Realtime search benefits social networks by providing a scalable >> searchable alternative to large Mysql implementations. Mysql I have >> heard is difficult to scale at a certain point. Apparently Google has >> created things like BigTable (a large database) and an online service >> called GData (which Google has not published any whitepapers on the >> technology underneath) to address scaling large database systems. >> BigTable does not offer search. GData does and is used by all of >> Google's web services instead of something like Mysql (this is at >> least how I understand it). Social networks usually grow and so >> scaling is continually an issue. It is possible to build a realtime >> search system that scales linearly, something that I have heard >> becomes difficult with Mysql. There is an article that discusses some >> of these issues >> http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=337 I >> don't think the current GData implementation is perfect and there is a >> lot that can be improved on. It might be helpful to figure out >> together what helpful things can be added. >> >> If this sounds like something of interest to anyone feel free to send >> your input. >> >> Take care, >> Jason >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org