Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 45F9D200B8E for ; Mon, 26 Sep 2016 23:53:39 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 450D4160ACA; Mon, 26 Sep 2016 21:53:39 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 89FF9160AC8 for ; Mon, 26 Sep 2016 23:53:38 +0200 (CEST) Received: (qmail 30102 invoked by uid 500); 26 Sep 2016 21:53:37 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 30081 invoked by uid 99); 26 Sep 2016 21:53:37 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Sep 2016 21:53:37 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 0CEA9DFD4C; Mon, 26 Sep 2016 21:53:37 +0000 (UTC) From: paul-rogers To: dev@drill.apache.org Reply-To: dev@drill.apache.org References: In-Reply-To: Subject: [GitHub] drill pull request #574: DRILL-4726: Dynamic UDFs support Content-Type: text/plain Message-Id: <20160926215337.0CEA9DFD4C@git1-us-west.apache.org> Date: Mon, 26 Sep 2016 21:53:37 +0000 (UTC) archived-at: Mon, 26 Sep 2016 21:53:39 -0000 Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/574#discussion_r80547641 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionImplementationRegistry.java --- @@ -301,29 +323,120 @@ private ScanResult scan(ClassLoader classLoader, Path path, URL[] urls) throws I return RunTimeScan.dynamicPackageScan(drillConfig, Sets.newHashSet(urls)); } } - throw new FunctionValidationException(String.format("Marker file %s is missing in %s.", + throw new JarValidationException(String.format("Marker file %s is missing in %s", CommonConstants.DRILL_JAR_MARKER_FILE_RESOURCE_PATHNAME, path.getName())); } - private static String getUdfDir() { - return Preconditions.checkNotNull(System.getenv("DRILL_UDF_DIR"), "DRILL_UDF_DIR variable is not set"); + /** + * Return list of jars that are missing in local function registry + * but present in remote function registry. + * + * @param remoteFunctionRegistry remote function registry + * @param localFunctionRegistry local function registry + * @return list of missing jars + */ + private List getMissingJars(RemoteFunctionRegistry remoteFunctionRegistry, + LocalFunctionRegistry localFunctionRegistry) { + List remoteJars = remoteFunctionRegistry.getRegistry().getJarList(); + List localJars = localFunctionRegistry.getAllJarNames(); + List missingJars = Lists.newArrayList(); + for (Jar jar : remoteJars) { + if (!localJars.contains(jar.getName())) { + missingJars.add(jar.getName()); + } + } + return missingJars; + } + + /** + * Creates local udf directory, if it doesn't exist. + * Checks if local is a directory and if current application has write rights on it. + * Attempts to clean up local idf directory in case jars were left after previous drillbit run. + * + * @return path to local udf directory + */ + private Path getLocalUdfDir() { + String confDir = getConfDir(); --- End diff -- Unfortunately, this won't work in the case of Drill-on-YARN. The $DRILL_HOME and $DRILL_CONF_DIR directories are read-only in that case. The new site directory (pointed to by DRILL_CONF_DIR) will contain a "jars" directory that contains statically-defined UDFs. In Drill-on-YARN, YARN copies all of the site directory to the local machine, but makes it read-only so that YARN can reuse that same "localized" copy for multiple runs. (That feature is handy fo map/reduce, but is not that useful for Drill. Still, that's how YARN works...) One solution: provide a config option that specifies the local UDF location. The Apache Drill default can be the config dir (assuming there is a way to reference the config dir from within drill-override.conf -- need to check that.) For DoY, we will change the location to be a temp directory location provided by YARN. Using the YARN temp directory ensures that the local udf dir starts out empty on each run. But, what about the "stock" Drill case? The $DRILL_CONFIG_DIR/udf directory probably will contain jars from a previous run. Is this desired? Does the code handle this case? Do we clean out UDFs that were dropped while the Drillbit was offline? Do we handle a partially-downloaded jar that was left incomplete when the previous run crashed? Or, would it be better to clear the udf directory on the start of each Drill run? If we do that, can we always write udfs to a temp directory? Perhaps review the temp directories available. Since DoY defines the temp directory at runtime, we need to set the temp diretory in drill-config.sh (which you did in a previous version.) As it turns out, Drill already has temp directories set in the config system (for spill-to-disk.) So we need to reconcile these two. Perhaps this: Define DRILL_TEMP_DIR in drill-config.sh. If it is set in the environment (the DoY case) or drill-env.sh (the non-DoY case), use it. Else, default to /tmp. Under DoY, we can run multiple drillbits on the same host (by changing ports, etc.) So we need a unique path. Define the actual Drillbit temp directory to be drillbit-temp-dir = $DRILL_TEMP_DIR/${drill-root}-${cluster-id} We need both the root and cluster ID because neither is unique by itself, unfortunately. Finally, udfs can reside in ${drillbit-temp-dir}/udf This is just one possibility to illustrate the issue. Feel free to create a better solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---