lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: frequent keyword computation within a search ( and timeinterval )
Date Thu, 05 Jan 2012 21:37:14 GMT
You will encounter endless grief until you stop
thinking of Solr/Lucene as a replacement for
an RDBMS. It is a *text search engine*.
Whenever you start asking "how do I implement
a SQL statement in Solr", you have to stop
and reconsider *why* you are trying to do that.
Then recast the question in terms of searching.

Short answer is that no, there isn't an aggregate
function. And you shouldn't even try.

Best
Erick

On Thu, Jan 5, 2012 at 12:53 PM, prasenjit mukherjee
<prasen.bea@gmail.com> wrote:
> Thanks Eric for the response.
>
> Will lucene/solr provide me aggregations ( of field vaues ) satisying
> a query criteria ? e.g. SELECT SUM(price) WHERE item=fruits
>
> Or I need to use hitCollector to achieve that ?
>
> Any sample solr/lucene query to compte aggregates ( like SUM ) will be great.
>
> -Thanks,
> Prasenjit
>
> On Thu, Jan 5, 2012 at 7:10 PM, Erick Erickson <erickerickson@gmail.com> wrote:
>> the time interval is just a RangeQuery in the Lucene
>> world. The rest is pretty standard search stuff.
>>
>> You probably want to have a look at the NRT
>> (near real time) stuff in trunk.
>>
>> Your reads/writes are pretty high, so you'll need
>> some experimentation to size your site
>> correctly.
>>
>> Best
>> Erick
>>
>> On Wed, Jan 4, 2012 at 12:17 AM, prasenjit mukherjee
>> <prasen.bea@gmail.com> wrote:
>>> I have a requirement where reads and writes are quite high ( @ 100-500
>>> per-sec ). A document has the following fields : timestamp,
>>> unique-docid,  content-text, keyword. Average content-text length is ~
>>> 20 bytes, there is only 1 keyword for a given docid.
>>>
>>> At runtime, given a query-term ( which could be null ) and a
>>> time-interval,  I need to find out top-k frequent keywords which
>>> contains the query-term ( optional if its null )  in its context-text
>>> field within that time-interval. I can purge the data every day, hence
>>> no need for me to have more than a days data.
>>>
>>> I have quite a few options here : Starting with MySQL, NoSQLs (
>>> Cassandra, Mongo, Couch, Riak, Redis ) , Search-Engine based (
>>> lucene/solr ) each having its own pros/cons.
>>>
>>> In MySQL we can achieve this via : GROUP-BY/COUNT  clause
>>> In NoSQL I can probably write a map/reduce task to query these
>>> numbers. Although I am not very sure about the query response time.
>>> Not sure of we can achieve it via lucene/solr OOB.
>>>
>>> Any suggestions on what would be a good choice for this use case ?
>>>
>>> -Thanks,
>>> prasenjit
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message