Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4EB2B958D for ; Thu, 5 Jan 2012 21:37:46 +0000 (UTC) Received: (qmail 19985 invoked by uid 500); 5 Jan 2012 21:37:44 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 19880 invoked by uid 500); 5 Jan 2012 21:37:43 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 19872 invoked by uid 99); 5 Jan 2012 21:37:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jan 2012 21:37:43 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of erickerickson@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-tul01m020-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jan 2012 21:37:35 +0000 Received: by obcwn14 with SMTP id wn14so1566093obc.35 for ; Thu, 05 Jan 2012 13:37:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=jrTPZoqC+MlaI8/DF5BaQ1oI5dtojbyzoplqn8CvGYA=; b=vabTtvak5uFO0sNZ8a1eTEdj4L/glW6kssYtKzp+ybkcnIMZQ3hYxxqBEoA0z43WbV wJ8Sjq3YITRktFmEc9lYNoQGmXRBy1FOOEvLfjdTA2aAvRR/U3HT5cqSLcRWdXH3Txi8 t1CoIq9WVnAgnSHaaqoT5Oq40v62a779pW554= MIME-Version: 1.0 Received: by 10.182.193.41 with SMTP id hl9mr2827567obc.44.1325799434280; Thu, 05 Jan 2012 13:37:14 -0800 (PST) Received: by 10.182.43.4 with HTTP; Thu, 5 Jan 2012 13:37:14 -0800 (PST) In-Reply-To: References: Date: Thu, 5 Jan 2012 16:37:14 -0500 Message-ID: Subject: Re: frequent keyword computation within a search ( and timeinterval ) From: Erick Erickson To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org You will encounter endless grief until you stop thinking of Solr/Lucene as a replacement for an RDBMS. It is a *text search engine*. Whenever you start asking "how do I implement a SQL statement in Solr", you have to stop and reconsider *why* you are trying to do that. Then recast the question in terms of searching. Short answer is that no, there isn't an aggregate function. And you shouldn't even try. Best Erick On Thu, Jan 5, 2012 at 12:53 PM, prasenjit mukherjee wrote: > Thanks Eric for the response. > > Will lucene/solr provide me aggregations ( of field vaues ) satisying > a query criteria ? e.g. SELECT SUM(price) WHERE item=3Dfruits > > Or I need to use hitCollector to achieve that ? > > Any sample solr/lucene query to compte aggregates ( like SUM ) will be gr= eat. > > -Thanks, > Prasenjit > > On Thu, Jan 5, 2012 at 7:10 PM, Erick Erickson = wrote: >> the time interval is just a RangeQuery in the Lucene >> world. The rest is pretty standard search stuff. >> >> You probably want to have a look at the NRT >> (near real time) stuff in trunk. >> >> Your reads/writes are pretty high, so you'll need >> some experimentation to size your site >> correctly. >> >> Best >> Erick >> >> On Wed, Jan 4, 2012 at 12:17 AM, prasenjit mukherjee >> wrote: >>> I have a requirement where reads and writes are quite high ( @ 100-500 >>> per-sec ). A document has the following fields : timestamp, >>> unique-docid, =A0content-text, keyword. Average content-text length is = ~ >>> 20 bytes, there is only 1 keyword for a given docid. >>> >>> At runtime, given a query-term ( which could be null ) and a >>> time-interval, =A0I need to find out top-k frequent keywords which >>> contains the query-term ( optional if its null ) =A0in its context-text >>> field within that time-interval. I can purge the data every day, hence >>> no need for me to have more than a days data. >>> >>> I have quite a few options here : Starting with MySQL, NoSQLs ( >>> Cassandra, Mongo, Couch, Riak, Redis ) , Search-Engine based ( >>> lucene/solr ) each having its own pros/cons. >>> >>> In MySQL we can achieve this via : GROUP-BY/COUNT =A0clause >>> In NoSQL I can probably write a map/reduce task to query these >>> numbers. Although I am not very sure about the query response time. >>> Not sure of we can achieve it via lucene/solr OOB. >>> >>> Any suggestions on what would be a good choice for this use case ? >>> >>> -Thanks, >>> prasenjit >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org