Return-Path: X-Original-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D51C7136 for ; Fri, 30 Sep 2011 16:31:10 +0000 (UTC) Received: (qmail 89011 invoked by uid 500); 30 Sep 2011 16:31:10 -0000 Delivered-To: apmail-incubator-jena-dev-archive@incubator.apache.org Received: (qmail 88986 invoked by uid 500); 30 Sep 2011 16:31:10 -0000 Mailing-List: contact jena-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jena-dev@incubator.apache.org Delivered-To: mailing list jena-dev@incubator.apache.org Received: (qmail 88978 invoked by uid 99); 30 Sep 2011 16:31:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Sep 2011 16:31:10 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Sep 2011 16:31:07 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 8FF9F2A2D55 for ; Fri, 30 Sep 2011 16:30:45 +0000 (UTC) Date: Fri, 30 Sep 2011 16:30:45 +0000 (UTC) From: "Paolo Castagna (Commented) (JIRA)" To: jena-dev@incubator.apache.org Message-ID: <38708585.11877.1317400245591.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1067296543.11518.1317393705800.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (JENA-126) Change temporary table threshold policy from count to memory size MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/JENA-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118174#comment-13118174 ] Paolo Castagna commented on JENA-126: ------------------------------------- I had a similar problem here, where I wanted to provide a sort of --spill-size-auto option: https://svn.apache.org/repos/asf/incubator/jena/Scratch/PC/tdbloader2/trunk/src/main/java/cmd/tdbloader2.java It's not that easy. But... Couldn't we have a ThresholdPolicy implementation which monitors the total and free memory and it triggers spilling when free memory is too low (after at least N bindings have been seen)? > Change temporary table threshold policy from count to memory size > ----------------------------------------------------------------- > > Key: JENA-126 > URL: https://issues.apache.org/jira/browse/JENA-126 > Project: Jena > Issue Type: Improvement > Components: ARQ > Reporter: Stephen Allen > > The "workCount" setting for temporary table sizes is not a good configuration option. Binding sizes could potentially vary from as little as 32 bytes (8 byte ref to the binding + 8 byte ref to a variable + 8 byte nodeID + 8 byte object overhead), to some bindings with multi-megabyte strings. Asking the user to know which one it is likely to be, and then how that count translates into memory usage (the real resource we are attempting to control) is already way too much IMO. > OK, so what the user wants is a way to specify the amount of memory that can be used by each query operator for temporary tables [1][2][3]. Hmm, wait, no what he maybe wants is a way to specify a the total memory used for temporary tables per query? No, maybe he wants to specify it for the whole query engine. > But that last paragraph is not accurate. What he *really* wants is a system that answers all of his queries for whatever data he has as fast as possible. He doesn't want to have to configure any parameters. Unfortunately, this is a really hard dynamic optimization problem so we foist it off on the user, hoping he'll be able to come up with some value. > We need to decide on what we want to use as a config parameter. I believe it should be a "workMem" or "tmpTableSize" setting that specifies the max memory usage of a temporary table before it is converted into an on-disk table. > [1] This is what most DB systems provide, specifically PostgreSQL and MySQL both have per operator temporary table sizes. PostgreSQL calls the setting "work_mem" and MySQL calls it "tmp_table_size" > [2] http://www.postgresql.org/docs/8.3/static/runtime-config-resource.html > [3] http://dev.mysql.com/doc/refman/5.0/en/internal-temporary-tables.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira