Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 57212 invoked from network); 1 Apr 2010 13:44:16 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Apr 2010 13:44:16 -0000 Received: (qmail 84101 invoked by uid 500); 1 Apr 2010 13:44:14 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 83996 invoked by uid 500); 1 Apr 2010 13:44:14 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 83988 invoked by uid 99); 1 Apr 2010 13:44:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Apr 2010 13:44:14 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of willjohnsonsearch@gmail.com designates 74.125.83.48 as permitted sender) Received: from [74.125.83.48] (HELO mail-gw0-f48.google.com) (74.125.83.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Apr 2010 13:44:09 +0000 Received: by gwaa12 with SMTP id a12so127615gwa.35 for ; Thu, 01 Apr 2010 06:43:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type; bh=70OXj/A+N535b+r8XGQXC2XGnmd6aa8C9/7K9KOmpYg=; b=C4H8WRyszdqqK7CAgUr7yC2VKOOUhZ2T81CQb5p+P734o1PgY2hdsgA/9mVs/3Nyzg xwFTjVkSdgB7FUi/sqb6zKbvQ5cGHjJu/m4hyH2F+WRyVf0MSdG0RXiH0hsILEfFpc38 1rfFWaZzAKtaZiFQAbB+SSPYZcsENukMO4/BA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=VclV2XaoKJmfC4GAxnxJ7ORsYB/7b3vkfED9f0BlT0wlLB+TvURFCkff0nTFt8tgUA aNYK173Ru1aI5r7FeV1n3KUxnlYl2i9+mT8O5KqLhkYF8Koc8w7khkFkDqYCntPUveQ4 XayVIzlpH9hww69JRnWqEopkDVw/lBv/NIq8g= MIME-Version: 1.0 Received: by 10.150.155.3 with HTTP; Thu, 1 Apr 2010 06:43:48 -0700 (PDT) In-Reply-To: References: <214FF1B5E37DC84D9968F0F82FBB112508D1AB06@AUGEXCH.ghsinc.com> Date: Thu, 1 Apr 2010 09:43:48 -0400 Received: by 10.150.237.17 with SMTP id k17mr1442574ybh.118.1270129428409; Thu, 01 Apr 2010 06:43:48 -0700 (PDT) Message-ID: Subject: Re: Lucene Challenge - sum, count, avg, etc. From: Will Johnson To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=000e0cd253e6662fbc04832d0d69 --000e0cd253e6662fbc04832d0d69 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Michel, You can do all of this with Lucene however not with a standard index/query operators. At Attivio we have a custom Lucene index structure + custom query operators that support relational joins across records in an index. = You can write the queries in our standard query language or run actual SQL. Al= l of this is done without pre-computing or flattening records as that prevent= s you from having query flexibility at runtime, ie what happens when you want to know something that isn=92t pre-computed/pre-flattened? If you look at the demo at the bottom of this page http://www.attivio.com/active-intelligence/aie-demo.html you can see how w= e index and query against both news articles and baseball statistics from a relational database. For example you can do something like this with the baseball data: select sum(RBI), teamID, yearID from master m join batting b on m."playerID" =3D b.playerID where b.yearID > '2004' group by yearID,teamID order by yearID,teamID We support min, max, avg and a number of other aggregate functions along with true full-text search. Another article you might check out is here: http://www.attivio.com/blog/55-industry-insights/507-can-a-search-engine-re= place-a-relational-database.html . So far we're getting some pretty good results competing with databases and data warehouses on raw performance (at customer sites) even without the full text search capabilities mixed in. Once you start adding in 'fuzzy' joins, relevancy, proximity and all the other boolean query logic, we start to pull ahead even further. If you want to learn more drop me a line. We'll be demonstrating all this stuff (and more) at Enterprise Search Summit (ESS) in New York this coming May. - will@attivio.com --000e0cd253e6662fbc04832d0d69--