Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of woolfel@gmail.com designates
 209.85.214.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOTz1wD9qrvYNPFUjWduA4TCrduP3iPh2dxJpYCPu2bwYgJ43g@mail.gmail.com>
References: 
 <CAOTz1wB6z7TuJbVjX09GtjsZg1=YcWoyD8b+GGE2J=02bVM1ew@mail.gmail.com>
	<CAKv2g8fC=bZyBRgZR83ThgNos_u9FST1mZt573t286Hv2c2ZoQ@mail.gmail.com>
	<CAOTz1wC69VNc=_b7DBjNeV2dWFNe+fReyNpC=9777iT790MBgQ@mail.gmail.com>
	<CAKv2g8cRBOCaTDVxCdu2gq5C5+sY3QBuNGNoLMJ_edy8R3Bv8w@mail.gmail.com>
	<CAOTz1wD9qrvYNPFUjWduA4TCrduP3iPh2dxJpYCPu2bwYgJ43g@mail.gmail.com>
Date: Sun, 21 Oct 2012 09:49:54 -0400
Message-ID: 
 <CAKv2g8c+stTc2jCnfBC-Ltn5SrTgFhgY0z9yueYOhOGnE83Uww@mail.gmail.com>
Subject: Re: rules engine with Hadoop
From: Peter Lin <woolfel@gmail.com>
To: user@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1

>From a java heap perspective, if you don't want huge full GC pauses,
avoid going over 2GB.

There's no simple answers on how many facts can be loaded in a rule
engine. If you want to learn more, email directly. Hadoop mailing list
isn't an appropriate place to get into the weeds of how to build
efficient rules, since it has nothing to do with hadoop.

On Sat, Oct 20, 2012 at 2:03 PM, Luangsay Sourygna <luangsay@gmail.com> wrote:
> Thanks for all the information. Many papers/book to read in my free time :)...
>
> Just to get an idea, what is the maximum memory consumed by a rule engine
> you have ever seen and what were its characteristic (how many facts
> loaded at the same
> time, how many rules and joins?) ?
>
> On Sat, Oct 20, 2012 at 4:38 PM, Peter Lin <woolfel@gmail.com> wrote:
>> All RETE implementations use RAM these days.
>>
>> There are older rule engines that used databases or file systems when
>> there wasn't enough RAM. The key to efficient scale of rulebase
>> systems or expert systems is loading only the data you need. An expert
>> system is inference engine + rules + functions + facts. Some products
>> shameless promote their rule engine as an expert system, when they
>> don't understand what the term means. Some rule engines are expert
>> systems shells, which provide a full programming environment without
>> needing IDE and a bunch of other stuff. For example CLIPS, JESS and
>> Haley come to mind.
>>
>> I would suggest reading Gary Riley's book
>> http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems
>>
>> In terms of nodes, that actually doesn't matter much due to the
>> discrimination network produced by RETE algorithm. What matters more
>> is the number of facts and % of the facts that match some of the
>> patterns declared in the rules.
>>
>> Most RETE implementations materialize the joins results, so that is
>> the biggest factor in memory consumption. For example, if you had 1000
>> rules, but only 3 have joins, they it doesn't make much difference. In
>> contrast, if you had 200 rules and each has 4 joins, it will consume
>> more memory for the same dataset.
>>
>> Proper scaling of rulebase systems requires years of experience and
>> expertise, so it's not something one should rush. It's best to study
>> the domain and methodically develop the rulebase so that it is
>> efficient. I would recommend you use JESS. Feel free to email me
>> directly if your company wants to hire experienced rule developer to
>> assist with your project.
>>
>> RETE rule engines are powerful tools, but it does require experience
>> to scale properly.