lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <m...@apache.org>
Subject Re: Can Lucene be used as Rules Engine?
Date Thu, 23 Jan 2020 07:42:19 GMT
Hello, Kart.
I still don't fully get the problem. But usually implementing Rule Engine
requires to use
https://lucene.apache.org/core/7_3_1/sandbox/org/apache/lucene/search/CoveringQuery.html
which
check number of rule clauses in a dedicated field.

On Thu, Jan 23, 2020 at 12:12 AM Karthick Sundaram
<karthick_s@trigent.com.invalid> wrote:

> Gentlemen:
>
>
>
> I am using Lucene as search engine for the below requirement:
>
>
>
> Millions of documents (text files) are there.
>
> Each text file has thousands of words (plain Strings with space separated).
>
> Example content of a text file 1 (just showing few words): 0001AAA 0001AAB
> 0001AAC 0061000 PSBP06 MFBP05 ...
>
> Example content of a text file 2 (just showing few words): 0001AAX 0001AAB
> 0001AAN 0061002 PSBP07 MFBP06 ...
>
>
>
> Then there are millions of rules captured in the database. For easy
> understanding, I specify couple of rules below:
>
>
>
> Rule 1:
>
> CONDITION 1: WITH: 0001AAA OR 0001AAC
>
> CONDITION 2: WITH: PSBP06 OR PSBP07
>
> CONDITION 3: WITH: MFBP05
>
>
>
> Rule 2:
>
> CONDITION 1: WITH: 0001AAN OR 0001AAC
>
> CONDITION 2: WITH: PSBP06
>
> CONDITION 3: WITH: PSBP08
>
> CONDITION 4: NOT WITH: MFBP05
>
>
>
> Requirement is, for a given rule, find the text files matching at least one
> word in each condition of the rule
>
> I indexed the contents of each text file as a Lucene document with a Field
> "FileContents" and another field to just store the file name
>
> So, for the Rule 1, I constructed query as (0001AAA OR 0001AAC) AND (PSBP06
> OR PSBP07) AND (MFBP05)
>
> And for Rule 2, the query is (0001AAN OR 0001AAC) AND (PSBP06) AND (PSBP08)
> AND NOT (MFBP05).
>
>
>
> Queries are working and able to find the appropriate text files.
>
>
>
> Now, I have another requirement which is reverse of above requirement.
>
> i.e., For the given text file, I need to find the list of Rules that can
> match.
>
> Example: For the text file 1, the "Rule 1" should match, because the text
> file 1 has 0001AAA which satisfies condition 1, PSBP06 will satisfies
> condition 2, MFBP05 will satisfy condition 3.
>
> Rule 1 has 3 conditions and at least one word in each condition matches for
> text file 1. So Rule 1 is good for text file 1.
>
> Rule 2 should not match for text file 1 because PSBP08 is not there in it.
>
>
>
> I don't know whether i can index the "Rule" information in Lucene. A rule
> can have 1 or more conditions, so I can't use fixed number of Fields to
> query on. Even if there are fixed number of fields, the query has to check
> for each field to match at least one word.
>
> Is it possible to handle this requirement using Lucene? or should I go for
> other options?
>
> I am new to Lucene, any help would be appreciated.
>
>
>
> Thanks,
>
> Kart
>
>

-- 
Sincerely yours
Mikhail Khludnev

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message