hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Lin <wool...@gmail.com>
Subject Re: rules engine with Hadoop
Date Sat, 20 Oct 2012 14:38:17 GMT
All RETE implementations use RAM these days.

There are older rule engines that used databases or file systems when
there wasn't enough RAM. The key to efficient scale of rulebase
systems or expert systems is loading only the data you need. An expert
system is inference engine + rules + functions + facts. Some products
shameless promote their rule engine as an expert system, when they
don't understand what the term means. Some rule engines are expert
systems shells, which provide a full programming environment without
needing IDE and a bunch of other stuff. For example CLIPS, JESS and
Haley come to mind.

I would suggest reading Gary Riley's book

In terms of nodes, that actually doesn't matter much due to the
discrimination network produced by RETE algorithm. What matters more
is the number of facts and % of the facts that match some of the
patterns declared in the rules.

Most RETE implementations materialize the joins results, so that is
the biggest factor in memory consumption. For example, if you had 1000
rules, but only 3 have joins, they it doesn't make much difference. In
contrast, if you had 200 rules and each has 4 joins, it will consume
more memory for the same dataset.

Proper scaling of rulebase systems requires years of experience and
expertise, so it's not something one should rush. It's best to study
the domain and methodically develop the rulebase so that it is
efficient. I would recommend you use JESS. Feel free to email me
directly if your company wants to hire experienced rule developer to
assist with your project.

RETE rule engines are powerful tools, but it does require experience
to scale properly.

On Sat, Oct 20, 2012 at 10:24 AM, Luangsay Sourygna <luangsay@gmail.com> wrote:
> In your RETE implementation, did you just relied on RAM to store the
> alpha and beta memories?
> What if there is a huge number of facts/WME/nodes and that you have to
> retain them for quite a long period (I mean: what happens if the
> alpha&beta memories gets higher than the RAM of your server?) ?
> HBase seemed interesting to me because it enables me to "scale out"
> this amount of memory and gives me the MR boost. Maybe there is a more
> interesting database/distributed cache for that?
> A big thank you anyway for your reply: I have googled a bit on your
> name and found many papers that should help me in going to the right
> direction (from this link:
> http://www.thecepblog.com/2010/03/06/rete-engines-must-forwards-and-backwards-chain/).
> Till now, the only paper I had found was:
> http://reports-archive.adm.cs.cmu.edu/anon/1995/CMU-CS-95-113.pdf
> (found on wikipedia) which I started to read.
> On Fri, Oct 19, 2012 at 10:30 PM, Peter Lin <woolfel@gmail.com> wrote:
>> Since I've implemented RETE algorithm, that is a terrible idea and
>> wouldn't be efficient.
>> storing alpha and beta memories in HBase is technically feasible, but
>> it would be so slow as to be useless.

View raw message