lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prasenjit mukherjee <prasen....@gmail.com>
Subject Re: frequent keyword computation within a search ( and timeinterval )
Date Thu, 05 Jan 2012 17:53:34 GMT
Thanks Eric for the response.

Will lucene/solr provide me aggregations ( of field vaues ) satisying
a query criteria ? e.g. SELECT SUM(price) WHERE item=fruits

Or I need to use hitCollector to achieve that ?

Any sample solr/lucene query to compte aggregates ( like SUM ) will be great.

-Thanks,
Prasenjit

On Thu, Jan 5, 2012 at 7:10 PM, Erick Erickson <erickerickson@gmail.com> wrote:
> the time interval is just a RangeQuery in the Lucene
> world. The rest is pretty standard search stuff.
>
> You probably want to have a look at the NRT
> (near real time) stuff in trunk.
>
> Your reads/writes are pretty high, so you'll need
> some experimentation to size your site
> correctly.
>
> Best
> Erick
>
> On Wed, Jan 4, 2012 at 12:17 AM, prasenjit mukherjee
> <prasen.bea@gmail.com> wrote:
>> I have a requirement where reads and writes are quite high ( @ 100-500
>> per-sec ). A document has the following fields : timestamp,
>> unique-docid,  content-text, keyword. Average content-text length is ~
>> 20 bytes, there is only 1 keyword for a given docid.
>>
>> At runtime, given a query-term ( which could be null ) and a
>> time-interval,  I need to find out top-k frequent keywords which
>> contains the query-term ( optional if its null )  in its context-text
>> field within that time-interval. I can purge the data every day, hence
>> no need for me to have more than a days data.
>>
>> I have quite a few options here : Starting with MySQL, NoSQLs (
>> Cassandra, Mongo, Couch, Riak, Redis ) , Search-Engine based (
>> lucene/solr ) each having its own pros/cons.
>>
>> In MySQL we can achieve this via : GROUP-BY/COUNT  clause
>> In NoSQL I can probably write a map/reduce task to query these
>> numbers. Although I am not very sure about the query response time.
>> Not sure of we can achieve it via lucene/solr OOB.
>>
>> Any suggestions on what would be a good choice for this use case ?
>>
>> -Thanks,
>> prasenjit
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message