Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of luangsay@gmail.com designates
 209.85.160.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAKv2g8cRBOCaTDVxCdu2gq5C5+sY3QBuNGNoLMJ_edy8R3Bv8w@mail.gmail.com>
References: 
 <CAOTz1wB6z7TuJbVjX09GtjsZg1=YcWoyD8b+GGE2J=02bVM1ew@mail.gmail.com>
	<CAKv2g8fC=bZyBRgZR83ThgNos_u9FST1mZt573t286Hv2c2ZoQ@mail.gmail.com>
	<CAOTz1wC69VNc=_b7DBjNeV2dWFNe+fReyNpC=9777iT790MBgQ@mail.gmail.com>
	<CAKv2g8cRBOCaTDVxCdu2gq5C5+sY3QBuNGNoLMJ_edy8R3Bv8w@mail.gmail.com>
Date: Sat, 20 Oct 2012 20:03:42 +0200
Message-ID: 
 <CAOTz1wD9qrvYNPFUjWduA4TCrduP3iPh2dxJpYCPu2bwYgJ43g@mail.gmail.com>
Subject: Re: rules engine with Hadoop
From: Luangsay Sourygna <luangsay@gmail.com>
To: user@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1

Thanks for all the information. Many papers/book to read in my free time :)...

Just to get an idea, what is the maximum memory consumed by a rule engine
you have ever seen and what were its characteristic (how many facts
loaded at the same
time, how many rules and joins?) ?

On Sat, Oct 20, 2012 at 4:38 PM, Peter Lin <woolfel@gmail.com> wrote:
> All RETE implementations use RAM these days.
>
> There are older rule engines that used databases or file systems when
> there wasn't enough RAM. The key to efficient scale of rulebase
> systems or expert systems is loading only the data you need. An expert
> system is inference engine + rules + functions + facts. Some products
> shameless promote their rule engine as an expert system, when they
> don't understand what the term means. Some rule engines are expert
> systems shells, which provide a full programming environment without
> needing IDE and a bunch of other stuff. For example CLIPS, JESS and
> Haley come to mind.
>
> I would suggest reading Gary Riley's book
> http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems
>
> In terms of nodes, that actually doesn't matter much due to the
> discrimination network produced by RETE algorithm. What matters more
> is the number of facts and % of the facts that match some of the
> patterns declared in the rules.
>
> Most RETE implementations materialize the joins results, so that is
> the biggest factor in memory consumption. For example, if you had 1000
> rules, but only 3 have joins, they it doesn't make much difference. In
> contrast, if you had 200 rules and each has 4 joins, it will consume
> more memory for the same dataset.
>
> Proper scaling of rulebase systems requires years of experience and
> expertise, so it's not something one should rush. It's best to study
> the domain and methodically develop the rulebase so that it is
> efficient. I would recommend you use JESS. Feel free to email me
> directly if your company wants to hire experienced rule developer to
> assist with your project.
>
> RETE rule engines are powerful tools, but it does require experience
> to scale properly.