incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Allen (Commented) (JIRA)" <>
Subject [jira] [Commented] (JENA-126) Change temporary table threshold policy from count to memory size
Date Mon, 03 Oct 2011 16:01:34 GMT


Stephen Allen commented on JENA-126:

Paulo, that's a good idea.  I've been stuck thinking about the problem in terms of a full
SPARQL server with lots of concurrent requests.  I think your idea could work well when you
only have a single databag like in tdbloader.  I would be interested to see how it scales
up as the number of bags increases.
> Change temporary table threshold policy from count to memory size
> -----------------------------------------------------------------
>                 Key: JENA-126
>                 URL:
>             Project: Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
> The "workCount" setting for temporary table sizes is not a good configuration option.
 Binding sizes could potentially vary from as little as 32 bytes (8 byte ref to the binding
+ 8 byte ref to a variable + 8 byte nodeID + 8 byte object overhead), to some bindings with
multi-megabyte strings.  Asking the user to know which one it is likely to be, and then how
that count translates into memory usage (the real resource we are attempting to control) is
already way too much IMO.
> OK, so what the user wants is a way to specify the amount of memory that can be used
by each query operator for temporary tables [1][2][3].  Hmm, wait, no what he maybe wants
is a way to specify a the total memory used for temporary tables per query?  No, maybe he
wants to specify it for the whole query engine.
> But that last paragraph is not accurate.  What he *really* wants is a system that answers
all of his queries for whatever data he has as fast as possible.  He doesn't want to have
to configure any parameters.  Unfortunately, this is a really hard dynamic optimization problem
so we foist it off on the user, hoping he'll be able to come up with some value.
> We need to decide on what we want to use as a config parameter.  I believe it should
be a "workMem" or "tmpTableSize" setting that specifies the max memory usage of a temporary
table before it is converted into an on-disk table.
> [1] This is what most DB systems provide, specifically PostgreSQL and MySQL both have
per operator temporary table sizes.  PostgreSQL calls the setting "work_mem" and MySQL calls
it "tmp_table_size"
> [2]
> [3]

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message