Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2147CD65D for ; Sat, 20 Oct 2012 18:04:17 +0000 (UTC) Received: (qmail 53535 invoked by uid 500); 20 Oct 2012 18:04:12 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 53437 invoked by uid 500); 20 Oct 2012 18:04:12 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 53426 invoked by uid 99); 20 Oct 2012 18:04:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Oct 2012 18:04:11 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of luangsay@gmail.com designates 209.85.160.176 as permitted sender) Received: from [209.85.160.176] (HELO mail-gh0-f176.google.com) (209.85.160.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Oct 2012 18:04:03 +0000 Received: by mail-gh0-f176.google.com with SMTP id z16so292246ghb.35 for ; Sat, 20 Oct 2012 11:03:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=czasqM5YgxhX41oJmrkTVS1E55WCBzfeg7DHnfQDvW8=; b=OHjUWUA32MuHrwbqMZ0zZq6NCDvmcdsShwJ+4Q9DaC97TllWZAUVkiA0dE1dfQ107u 1tsMW9j26vnaSzK+u0MCex4FJKhtd7sYbOtpvgz5wpU8Nfe9kb1SFyOn4zEFhCRiASXh 3uh6XIqRlcHDKG/z2oECDma6EN5nOABD76bSJKCD5Oi9fk+jq12+LJOKeGp2WSlZuG8k L5Hg44y0pfO6bC7hq2Gdb2QX4fXNWl2QLveFqd4tVh6Qo70olrPaWV6aEwn9WaE30lrU YuwcWvYf8m/Nb5Tw8C4Ot1HIfITWMx+hpgSPLeGBnwLZq7uHyCnwpGS8L6Z/0JD8fjkm gBsQ== MIME-Version: 1.0 Received: by 10.236.85.97 with SMTP id t61mr4457630yhe.53.1350756222915; Sat, 20 Oct 2012 11:03:42 -0700 (PDT) Received: by 10.147.150.6 with HTTP; Sat, 20 Oct 2012 11:03:42 -0700 (PDT) In-Reply-To: References: Date: Sat, 20 Oct 2012 20:03:42 +0200 Message-ID: Subject: Re: rules engine with Hadoop From: Luangsay Sourygna To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Thanks for all the information. Many papers/book to read in my free time :)... Just to get an idea, what is the maximum memory consumed by a rule engine you have ever seen and what were its characteristic (how many facts loaded at the same time, how many rules and joins?) ? On Sat, Oct 20, 2012 at 4:38 PM, Peter Lin wrote: > All RETE implementations use RAM these days. > > There are older rule engines that used databases or file systems when > there wasn't enough RAM. The key to efficient scale of rulebase > systems or expert systems is loading only the data you need. An expert > system is inference engine + rules + functions + facts. Some products > shameless promote their rule engine as an expert system, when they > don't understand what the term means. Some rule engines are expert > systems shells, which provide a full programming environment without > needing IDE and a bunch of other stuff. For example CLIPS, JESS and > Haley come to mind. > > I would suggest reading Gary Riley's book > http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems > > In terms of nodes, that actually doesn't matter much due to the > discrimination network produced by RETE algorithm. What matters more > is the number of facts and % of the facts that match some of the > patterns declared in the rules. > > Most RETE implementations materialize the joins results, so that is > the biggest factor in memory consumption. For example, if you had 1000 > rules, but only 3 have joins, they it doesn't make much difference. In > contrast, if you had 200 rules and each has 4 joins, it will consume > more memory for the same dataset. > > Proper scaling of rulebase systems requires years of experience and > expertise, so it's not something one should rush. It's best to study > the domain and methodically develop the rulebase so that it is > efficient. I would recommend you use JESS. Feel free to email me > directly if your company wants to hire experienced rule developer to > assist with your project. > > RETE rule engines are powerful tools, but it does require experience > to scale properly.