Michael Lackhoff wrote: > On 24.10.2008 15:03 Michael Peters wrote: > >> This is only true if those structures were created during run time and go out of scope at run time. >> If they are generated at compile time or attached to global variables or package level variables, >> they will not be re-used by Perl. > > Wait a minute, I would like to do exactly that: use a config module in > startup.pl that loads some massive config hashes in the hope that the > memory they use will be shared: > > package MyConfig; > our $aHugeConfigHash = load_data_from_config_file(); > > then in my mod_perl module: > my $conf = $MyConfig::aHugeConfigHash; > > (well sort of, it is actually wrapped in an accessor but that gets its > data from the package variable) > > Are you saying, I cannot share the memory this way? Yes, he is saying that. You cannot share memory between Apache "children" (independently of whether we are talking about perl, mod_perl, or whatever else). Each child is a separate process, with its separate copy of mod_perl, the perl interpreter, global variables, everything. What happens when you are using a mod_perl startup script is : Apache will load mod_perl and perl, and compile and execute this script (and all that it "use"'s) *before* it forks into multiple children. So when Apache has finished its initialisation, and forks into multiple children, each one of those will have its own copy of what was compiled and run and initialised there, without needing to recompile and execute them itself. The same happens when in the future Apache creates a new child (by forking again) : this new child will also get that same initial copy of the modules and structures you compile/create at startup time. To a certain extent, this can save memory under modern operating systems, because a piece of memory that is identical for a number of processes, can be in memory only once, and shared between processes, *as long as nothing in it is modified*. (That's the "copy-on-write" thing). But as soon as one of the processes modifies something in that memory area, the OS will copy the entire area and give a new copy to the process to modify, and after that the process keeps this "personal copy". So any changes made to this table are invisible to the other processes (Apache children), because they are still using the unmodified "shared" original copy. That can still be a huge time saving though. Imagine that loading this table initially takes 2 minutes, and that you have 30 Apache children. If you load it in your startup script, it will be done once and take 2 minutes. If you don't, it will be done in each new Apache child, and take in total 60 minutes, plus 2 minutes each time a child dies and a new one is started. In your case, what that means is : if you allocate your huge hashtable once at the beginning, and later you never modify it, then yes you can probably consider that it will be loaded and present in memory only once (but even that depends on how perl internally handles it). But as soon as one of the Apache children modifies this hashtable, then it is 100% sure that this process now has its own copy forever after. Now, one of the characteristics of running things under mod_perl, is that mod_perl and the perl interpreter are "persistent" within that Apache child. In other words, it is the same mod_perl and perl interpreter that execute many modules or scripts one after the other, and they never themselves terminate. And they do "remember" some things between consecutive runs of scripts or modules. That is usually undesirable, because it can give nasty errors : a variable that you declare with "my $var" and that you expect to be "undef", might not be, if a previous run of the same script or module (in the same Apache child) has left something in it. But if you use this carefully, it may also be very useful, because it might "remember" your hashtable between one call and the next, and avoid you having to reload the table from scratch. Just be careful about this, and remember always that when you find something already in the table, it is due to a previous run of something in this particular Apache child, not in Apache in general. You are still not sharing this table with other Apache children and other mod_perl and perl instances. > And if so, is there an alternative? > There are several, which depend on what you really do with this data, how often it is modified etc. One alternative goes somewhat like this : - the table is loaded from the original data in the startup script, and a reference to it put in a global variable (our $hashtable) - the startup script then writes the loaded table into a file, as a Storable object, and initialises another global variable $stamp with the current time. - each time your application module/script starts, it compares its global $stamp variable with the Storable file's timestamp. If they are different (and only then), it reloads the table from the Storable file. That, hopefully, is a lot faster than having to rebuild the table from scratch. If the table is mostly used read-only, and modifications to it are unfrequent, that may be your best bet. Of course if one process modifies the table, and the changes have to become visible to the others, it needs to rewrite the Storable object, with an appropriate inter-process locking mechanism. The thing is that if a table is in a global variable, it will be kept in the memory *of that Apache child* across separate invocations of the application modules over time *executed by that same child*. So if it does not change often, you may run the script hundreds of times before it needs to reload the table. Another alternative is to have this huge data structure loaded/created by a totally independent "server" process, and have all your application modules/scripts access this separate process through TCP/IP to read or modify the table. There exists a module like that somewhere in CPAN, I believe it is called "daemon"-something. IPC-based modules also exist, but they work only under Unix/Linux.