Return-Path: X-Original-To: apmail-asterixdb-dev-archive@minotaur.apache.org Delivered-To: apmail-asterixdb-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F21DD19A4C for ; Fri, 4 Mar 2016 22:04:36 +0000 (UTC) Received: (qmail 42624 invoked by uid 500); 4 Mar 2016 22:04:36 -0000 Delivered-To: apmail-asterixdb-dev-archive@asterixdb.apache.org Received: (qmail 42573 invoked by uid 500); 4 Mar 2016 22:04:36 -0000 Mailing-List: contact dev-help@asterixdb.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.incubator.apache.org Delivered-To: mailing list dev@asterixdb.incubator.apache.org Received: (qmail 42561 invoked by uid 99); 4 Mar 2016 22:04:36 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Mar 2016 22:04:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 396DA1806AD for ; Fri, 4 Mar 2016 22:04:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.198 X-Spam-Level: * X-Spam-Status: No, score=1.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id K2aD8pEbZw_O for ; Fri, 4 Mar 2016 22:04:34 +0000 (UTC) Received: from mail-lb0-f172.google.com (mail-lb0-f172.google.com [209.85.217.172]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A5DE05F572 for ; Fri, 4 Mar 2016 22:04:33 +0000 (UTC) Received: by mail-lb0-f172.google.com with SMTP id cf7so59961439lbb.1 for ; Fri, 04 Mar 2016 14:04:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=aVXrk/YiEjxAWal8x/H9RY+PC/fzB2UHEcReq0mFwqE=; b=hzrOORsJZRX0Yhk7rlq+UtonGzyN8f/xwI99kyF84LnH2fSdhJM/yrpF5I1rtcCwmz c9ggT3RVqQvBAm3At3Et1EG1gnT/rDI1R0sPUEetq21WsR1vRBcuAbDChV6jo/TE1fdj ncqMJ/yuPwTeIl+fj6G4dfnMKYZ882rbb6iEY1GKXK/QrTckKZ6TGF3H4diB/11Cxt6Q TA0Miho5V5H51ObyCoig/7Q8sqVtoG7ALFhTnXLLkLElT0e6gXK+c4fmSkRqCY1Ni4Bc pTZs5Ccu8a/HHTebd2EuNHoZJlebbVhS7fLSgzxaKWRAHsbBP7Czr19TqWNVlWTljsi5 KMSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=aVXrk/YiEjxAWal8x/H9RY+PC/fzB2UHEcReq0mFwqE=; b=LBnOMgJnYnd6si+HK9pI4sLkp6l5I5DJJVcOFN95ScT0PAN9oZbxKOI4XUME/LK6Oo 5q769W/7hgbIYGKzh632CGXHbtau5/IKQTskm3bd5+UKf+8zL615DST3TWl5rqy6M07L X2T/EVGpiR46XxKpo7abVUMvRIrY4Y2FrkcBrQf61z9yY76PSuxAGHkzrMpW7oGn6Tw9 PjfV0s7O2fYOwtFOEhSSD4oZumpfm+Z0KuC3OdWnWla6QRYARORd8HHZL3Fk4m+66lTL R5P+5AcjVvjcR+OiFrMFDAkzT2q3UlGRZgZREfd2gva/0atyBu8oqTubCBzCmf9QCYS/ M/hw== X-Gm-Message-State: AD7BkJL5M7Y+MYhNIX25R6neboeLoKicRP7++hu5ZDK0Vt1jy1qwzSW8eTozXRrqMwqiYFrK/OTykZPvow4mzg== MIME-Version: 1.0 X-Received: by 10.112.205.38 with SMTP id ld6mr4015152lbc.55.1457129072303; Fri, 04 Mar 2016 14:04:32 -0800 (PST) Received: by 10.114.64.141 with HTTP; Fri, 4 Mar 2016 14:04:32 -0800 (PST) In-Reply-To: References: Date: Fri, 4 Mar 2016 14:04:32 -0800 Message-ID: Subject: Re: Do we have a method to append local files to existed dataset? From: Young-Seok Kim To: dev@asterixdb.incubator.apache.org Content-Type: multipart/alternative; boundary=001a11c3c8e0bf26ca052d404cba --001a11c3c8e0bf26ca052d404cba Content-Type: text/plain; charset=UTF-8 That makes sense. Cheers, Young-Seok On Fri, Mar 4, 2016 at 1:48 PM, Yingyi Bu wrote: > Young-Seok, > > That works when the number of local files is relatively small. > However, when the number of localfs files is 1000, the 1000 files will be > loaded in parallel simultaneously, which will exhaust all system resources. > Loading from HDFS doesn't have the problem because the 1000 (or more) file > splits will be queued into each parallel loader. > > Best, > Yingyi > > > On Fri, Mar 4, 2016 at 1:42 PM, Young-Seok Kim wrote: > > > You can also load multiple adm files into a same dataset with a single > AQL > > as follows: > > > > load dataset Tweets > > > > using "org.apache.asterix.external.dataset.adapter.NCFileSystemAdapter" > > > > (("path"= > > > > "130.149.249.60 > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi27-pid0.adm, > > > > 130.149.249.53 > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi26-pid1.adm, > > > > 130.149.249.54 > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi25-pid2.adm, > > > > 130.149.249.55 > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi24-pid3.adm, > > > > 130.149.249.56 > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi23-pid4.adm, > > > > 130.149.249.57 > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi22-pid5.adm, > > > > 130.149.249.58 > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi21-pid6.adm, > > > > 130.149.249.59 > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi20-pid7.adm"), > > > > ("format"="adm")); > > > > > > The above AQL loads 8 adm files into a single dataset named Tweets. > > > > > > Cheers, > > > > Young-Seok > > > > On Fri, Mar 4, 2016 at 12:19 PM, Xikui Wang wrote: > > > > > Hi Yingyi, > > > > > > Thanks for your reply. I think the external dataset with scan query is > a > > > good solution. > > > I will try that. Thank you. > > > > > > Best, > > > Xikui > > > > > > On Fri, Mar 4, 2016 at 11:53 AM, Yingyi Bu wrote: > > > > > > > Xikui, > > > > > > > > If the number of localfs files is too large, a solution could be to > > put > > > > your files on HDFS and then load it. Loading from HDFS always has a > > > fixed > > > > degree of parallelism regardless of the number of files. > > > > > > > > >> I am wondering is there a way to append adm file to existed > dataset? > > > > You can create an external dataset and then write an insert statement > > > where > > > > the body is a scan query. AsterixDB doesn't load any data into its > own > > > > storage for an external dataset but just keeps file paths. > > > > Here is a manual for external datasets: > > > > https://ci.apache.org/projects/asterixdb/aql/externaldata.html > > > > > > > > Best, > > > > Yingyi > > > > > > > > > > > > On Fri, Mar 4, 2016 at 11:47 AM, Xikui Wang wrote: > > > > > > > > > Hi, > > > > > > > > > > I want to import data from multiple adm files into a same dataset. > > > > Merging > > > > > them together and then loading from localfs can be a viable > solution, > > > but > > > > > this may become a problem when the number become too large. I am > > > > wondering > > > > > is there a way to append adm file to existed dataset? > > > > > > > > > > Thank you. > > > > > > > > > > Best, > > > > > Xikui > > > > > > > > > > > > > > > --001a11c3c8e0bf26ca052d404cba--