lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prasenjit mukherjee <prasen....@gmail.com>
Subject frequent keyword computation within a search ( and timeinterval )
Date Wed, 04 Jan 2012 05:17:30 GMT
I have a requirement where reads and writes are quite high ( @ 100-500
per-sec ). A document has the following fields : timestamp,
unique-docid,  content-text, keyword. Average content-text length is ~
20 bytes, there is only 1 keyword for a given docid.

At runtime, given a query-term ( which could be null ) and a
time-interval,  I need to find out top-k frequent keywords which
contains the query-term ( optional if its null )  in its context-text
field within that time-interval. I can purge the data every day, hence
no need for me to have more than a days data.

I have quite a few options here : Starting with MySQL, NoSQLs (
Cassandra, Mongo, Couch, Riak, Redis ) , Search-Engine based (
lucene/solr ) each having its own pros/cons.

In MySQL we can achieve this via : GROUP-BY/COUNT  clause
In NoSQL I can probably write a map/reduce task to query these
numbers. Although I am not very sure about the query response time.
Not sure of we can achieve it via lucene/solr OOB.

Any suggestions on what would be a good choice for this use case ?

-Thanks,
prasenjit

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message