lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charlie Hull <>
Subject Re: Many patterns against many sentences, storing all results
Date Wed, 06 Jan 2016 10:17:54 GMT
On 05/01/2016 16:05, Allison, Timothy B. wrote:
> Might want to look into:

Yes, this sounds like a very good fit for Luwak. We built it originally 
for media monitoring applications where one also needs just a hit/no-hit 
result. It's running in production at much larger scale than this.



> or
> -----Original Message-----
> From: Will Moy []
> Sent: Tuesday, January 05, 2016 11:02 AM
> To:
> Subject: Many patterns against many sentences, storing all results
> Hello
> Please may I have your advice as to whether Solr is a good tool for this job?
> We have (per year) –
> Up to 50,000,000 sentences
> And about 5,000 search patterns (i.e. queries)
> Our task is to identify all matches between any sentence and any search pattern.
> That list of detections must be kept up to date as patterns are added or updated (a handful
an hour), and as new sentences are added.
> Some of the sentences will be added in real time, at probably max 100 / second and usually
much less. The detections on these should be provided within 3 seconds.
> It's an unusual application in that we want all results in an external DB, and also in
that every sentence is either a hit or not. we don't care about scoring results, only about
matches for the exact search pattern entered.
> The application is automatically detecting instances of factchecked statements.
> The smaller-scale prototype was done with postgres full text searching, but that can't
do exact phrase matching or other more sophisticated searches, so it's out.
> Thanks very much
> Will

Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828

View raw message