asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Young-Seok Kim <kiss...@gmail.com>
Subject Re: Do we have a method to append local files to existed dataset?
Date Fri, 04 Mar 2016 22:04:32 GMT
That makes sense.

Cheers,
Young-Seok

On Fri, Mar 4, 2016 at 1:48 PM, Yingyi Bu <buyingyi@gmail.com> wrote:

> Young-Seok,
>
> That works when the number of local files is relatively small.
> However, when the number of localfs files is 1000,  the 1000 files will be
> loaded in parallel simultaneously, which will exhaust all system resources.
> Loading from HDFS doesn't have the problem because the 1000 (or more) file
> splits will be queued into each parallel loader.
>
> Best,
> Yingyi
>
>
> On Fri, Mar 4, 2016 at 1:42 PM, Young-Seok Kim <kisskys@gmail.com> wrote:
>
> > You can also load multiple adm files into a same dataset with a single
> AQL
> > as follows:
> >
> > load dataset Tweets
> >
> > using "org.apache.asterix.external.dataset.adapter.NCFileSystemAdapter"
> >
> > (("path"=
> >
> > "130.149.249.60
> >
> >
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi27-pid0.adm,
> >
> > 130.149.249.53
> >
> >
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi26-pid1.adm,
> >
> > 130.149.249.54
> >
> >
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi25-pid2.adm,
> >
> > 130.149.249.55
> >
> >
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi24-pid3.adm,
> >
> > 130.149.249.56
> >
> >
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi23-pid4.adm,
> >
> > 130.149.249.57
> >
> >
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi22-pid5.adm,
> >
> > 130.149.249.58
> >
> >
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi21-pid6.adm,
> >
> > 130.149.249.59
> >
> >
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi20-pid7.adm"),
> >
> > ("format"="adm"));
> >
> >
> > The above AQL loads 8 adm files into a single dataset named Tweets.
> >
> >
> > Cheers,
> >
> > Young-Seok
> >
> > On Fri, Mar 4, 2016 at 12:19 PM, Xikui Wang <xikuiw@uci.edu> wrote:
> >
> > > Hi Yingyi,
> > >
> > > Thanks for your reply. I think the external dataset with scan query is
> a
> > > good solution.
> > > I will try that. Thank you.
> > >
> > > Best,
> > > Xikui
> > >
> > > On Fri, Mar 4, 2016 at 11:53 AM, Yingyi Bu <buyingyi@gmail.com> wrote:
> > >
> > > > Xikui,
> > > >
> > > > If the number of localfs files is too large,  a solution could be to
> > put
> > > > your files on HDFS and then load it.  Loading from HDFS always has a
> > > fixed
> > > > degree of parallelism regardless of the number of files.
> > > >
> > > > >> I am wondering is there a way to append adm file to existed
> dataset?
> > > > You can create an external dataset and then write an insert statement
> > > where
> > > > the body is a scan query. AsterixDB doesn't load any data into its
> own
> > > > storage for an external dataset but just keeps file paths.
> > > > Here is a manual for external datasets:
> > > > https://ci.apache.org/projects/asterixdb/aql/externaldata.html
> > > >
> > > > Best,
> > > > Yingyi
> > > >
> > > >
> > > > On Fri, Mar 4, 2016 at 11:47 AM, Xikui Wang <xikuiw@uci.edu> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I want to import data from multiple adm files into a same dataset.
> > > > Merging
> > > > > them together and then loading from localfs can be a viable
> solution,
> > > but
> > > > > this may become a problem when the number become too large. I am
> > > > wondering
> > > > > is there a way to append adm file to existed dataset?
> > > > >
> > > > > Thank you.
> > > > >
> > > > > Best,
> > > > > Xikui
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message