Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 19271 invoked from network); 14 May 2008 17:02:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 14 May 2008 17:02:46 -0000 Received: (qmail 52296 invoked by uid 500); 14 May 2008 17:02:46 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 52277 invoked by uid 500); 14 May 2008 17:02:46 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 52266 invoked by uid 99); 14 May 2008 17:02:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 May 2008 10:02:46 -0700 X-ASF-Spam-Status: No, hits=2.6 required=10.0 tests=DNS_FROM_OPENWHOIS,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 May 2008 17:01:59 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1JwKMn-0007WW-1x for general@lucene.apache.org; Wed, 14 May 2008 10:02:13 -0700 Message-ID: <17236078.post@talk.nabble.com> Date: Wed, 14 May 2008 10:02:13 -0700 (PDT) From: mik07 To: general@lucene.apache.org Subject: Re: Online Question Answering demo using Lucene In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: someone29_7@yahoo.de References: <17232494.post@talk.nabble.com> X-Virus-Checked: Checked by ClamAV on apache.org Thanks! And you are right, it's roughly the same as Powerset. It's slower because: * The demo runs on a single machine (not on a cluster). * We need to query search engines through their API, which have a 1 second build-in delay per query. * We parse sentences once we retrieve them from the search engines and parsers are still rather slow. Powerset on the other hand, parses Wikipedia before indexing and indexes the semantic structures. So no parsing needs to be performed when a user asks a query (beside the parsing of that query, I suppose.) * The Lucene index of the complete English Wikipedia we built is 8.3 GB big. On our machine it takes 2 seconds per query to get a result. You could address these issues with enough money and man power. But it's just a research project, developed by one person. We don't have the resources. (Please drop me an email if you have some ;-) Cheers, Michael -- View this message in context: http://www.nabble.com/Online-Question-Answering-demo-using-Lucene-tp17232494p17236078.html Sent from the Lucene - General mailing list archive at Nabble.com.