Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D2F75D34C for ; Sun, 21 Oct 2012 13:50:26 +0000 (UTC) Received: (qmail 37169 invoked by uid 500); 21 Oct 2012 13:50:22 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 37082 invoked by uid 500); 21 Oct 2012 13:50:22 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 37052 invoked by uid 99); 21 Oct 2012 13:50:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Oct 2012 13:50:21 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of woolfel@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Oct 2012 13:50:15 +0000 Received: by mail-ob0-f176.google.com with SMTP id x4so2094120obh.35 for ; Sun, 21 Oct 2012 06:49:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=/4rNhTlAUIHvoAk5Fa6NToiMrKa10P9F87pmEuwyKA8=; b=CIPTxWMcWnWdDbT8BJLsucQw+1EP2nDRVg+CnyS+yGjzoFCCvoCXDo+4LJDZhtTk8i vYZcNUiRyukUMgYatDqj9mjN6XhN/nWM09P5M3FrqyhgftXM1UkC6NRH0JBQYUiN4dOi Ta/5fv6jc5Bx/LnnzbLruRq70rxssXLcrfHxNkBdqDLbpfGf5guUJZtgWXSbLacV20PO sC+ZQNA0Wd743YhnrudUyiPudGdzKi7/lCG28eiAdgN+cxklTT9xS524swckid4HYlMS +8Hd9szwLugFqP7VVdZ88ks2WtvmJb6/b+WTSwQ7606aJqXxKhTLjN3jGvY2An8Z+nkv 7kaw== MIME-Version: 1.0 Received: by 10.182.192.74 with SMTP id he10mr4859775obc.87.1350827394279; Sun, 21 Oct 2012 06:49:54 -0700 (PDT) Received: by 10.76.21.169 with HTTP; Sun, 21 Oct 2012 06:49:54 -0700 (PDT) In-Reply-To: References: Date: Sun, 21 Oct 2012 09:49:54 -0400 Message-ID: Subject: Re: rules engine with Hadoop From: Peter Lin To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org >From a java heap perspective, if you don't want huge full GC pauses, avoid going over 2GB. There's no simple answers on how many facts can be loaded in a rule engine. If you want to learn more, email directly. Hadoop mailing list isn't an appropriate place to get into the weeds of how to build efficient rules, since it has nothing to do with hadoop. On Sat, Oct 20, 2012 at 2:03 PM, Luangsay Sourygna wrote: > Thanks for all the information. Many papers/book to read in my free time :)... > > Just to get an idea, what is the maximum memory consumed by a rule engine > you have ever seen and what were its characteristic (how many facts > loaded at the same > time, how many rules and joins?) ? > > On Sat, Oct 20, 2012 at 4:38 PM, Peter Lin wrote: >> All RETE implementations use RAM these days. >> >> There are older rule engines that used databases or file systems when >> there wasn't enough RAM. The key to efficient scale of rulebase >> systems or expert systems is loading only the data you need. An expert >> system is inference engine + rules + functions + facts. Some products >> shameless promote their rule engine as an expert system, when they >> don't understand what the term means. Some rule engines are expert >> systems shells, which provide a full programming environment without >> needing IDE and a bunch of other stuff. For example CLIPS, JESS and >> Haley come to mind. >> >> I would suggest reading Gary Riley's book >> http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems >> >> In terms of nodes, that actually doesn't matter much due to the >> discrimination network produced by RETE algorithm. What matters more >> is the number of facts and % of the facts that match some of the >> patterns declared in the rules. >> >> Most RETE implementations materialize the joins results, so that is >> the biggest factor in memory consumption. For example, if you had 1000 >> rules, but only 3 have joins, they it doesn't make much difference. In >> contrast, if you had 200 rules and each has 4 joins, it will consume >> more memory for the same dataset. >> >> Proper scaling of rulebase systems requires years of experience and >> expertise, so it's not something one should rush. It's best to study >> the domain and methodically develop the rulebase so that it is >> efficient. I would recommend you use JESS. Feel free to email me >> directly if your company wants to hire experienced rule developer to >> assist with your project. >> >> RETE rule engines are powerful tools, but it does require experience >> to scale properly.