asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep Joshi <sanjos...@gmail.com>
Subject Re: external data set support
Date Wed, 17 Feb 2016 07:59:51 GMT
Comments in text..

On Sun, Feb 14, 2016 at 1:14 PM, abdullah alamoudi <bamousaa@gmail.com>
wrote:

> Hi Sandeep,
> Here are the answers as per my understanding of the questions:
>
> 1) Schema catalog : One would have implement IMetadataProvider,
> IDataSource, IDataSourceIndex and other related classes.  Is there any
> functionality missing from the current schema implementation for external
> data sets ?
> Schema information for external data already exists and we use the
> AqlMetadataProvider for both external and internal datasets.
>
> One of the papers says that one should add comparators and hash functions
> for any new data types introduced by the external data set.  Which
> interface does one have to implement for that ?
> I am not sure which paper you're referring to but for adding new data types
> (regardless for use with internal or external. there is really no
> distinction) here is what needs to be done:
> 1. For complex types, one can simply define a type using the create type
> statement.
> 2. For completely new types, one needs to implement at least {IAType,
> IBinaryComparatorFactory, and IBinaryComparator}. I am not sure if that is
> enough but that is a starting point.
>
> 2) Query optimization : There is no cost-based optimizer yet within
> Algebricks, therefore there is no API to support retrieval and use of table
> statistics from an external data source.
>
> Is something planned in this regard ?
> Cost based optimizer for internal datasets is being worked on (@Ildar might
> add here). As for external data, unfortunately right now, we don't even
> employ some easy rule based optimizations. For example, we can utilize RC
> files structure to push project into data source operator but we don't do
> that yet. Another optimization that can be done is lazy deserialization of
> records but again we don't do that. There are plans to do all of these but
> we have man power shortage. You are welcome to give them a shot and we can
> assist.
>

I will get back on that...


>
>
> 3) Data fetch and update : The VLDB'14 paper states that external data sets
> are read-only, static and without indices, but the current codebase has
> support for IExternalIndex and IIndexibleExternalDataSource, so presumably
> I can fetch records from an external data source (base table scan as well
> as index).
> Yes, we can access external data through indexes. probably by the time the
> VLDB'14 paper was published, we didn't have this feature yet. You can check
> http://dl.acm.org/citation.cfm?id=2806428 which is about external data
> access and indexing.
>
>
Could you please add this paper to the Publications page ?

https://asterixdb.ics.uci.edu/publications.html

I was going by that information when I asked questions



> Can I write to an external data source ?
> Right now, this is not supported because we can't provide the same
> transactional guarantees we can with internal datasets. This point probably
> needs to be discussed with Mike before doing anything about it. I believe
> we offer some other thing that can be utilized which is righting query
> results into files but I am not sure.
>
>
> 4) Hyracks runtime : For data retrieval, is it sufficient to implement the
> interfaces within asterix.external.api or does one also have to add some
> Hyracks operators which are constructed via contributeRuntimeOperator ?
>
> For data retrieval, one only needs to implement IExternalDataSourceFactory
> along with IRecordReader<? extends T> or IInputStreamProvider (depending on
> whether the source produces a stream or a set of records).
>
> For data parsing, one only needs to implements IDataParserFactory along
> with IRecordDataParser<T> or IStreamDataParser (depending on whether the
> parsed data source produces a stream or a set of records).
>
> Let me know if I can provide more information.
> Cheers,
> Abdullah.
>
> P.S,
> Thanks for doing your work before asking. This is a great sign :)
>
> Amoudi, Abdullah.
>
> On Sun, Feb 14, 2016 at 10:17 AM, Sandeep Joshi <sanjos100@gmail.com>
> wrote:
>
> > Can someone describe the level of support for External data sets and the
> > future roadmap ?
> >
> > Let me divide the question into four broad issues:
> >
> > 1) Schema catalog : One would have implement IMetadataProvider,
> > IDataSource, IDataSourceIndex and other related classes.  Is there any
> > functionality missing from the current schema implementation for external
> > data sets ?
> >
> > One of the papers says that one should add comparators and hash functions
> > for any new data types introduced by the external data set.  Which
> > interface does one have to implement for that ?
> >
> > 2) Query optimization : There is no cost-based optimizer yet within
> > Algebricks, therefore there is no API to support retrieval and use of
> table
> > statistics from an external data source.
> >
> > Is something planned in this regard ?
> >
> > 3) Data fetch and update : The VLDB'14 paper states that external data
> sets
> > are read-only, static and without indices, but the current codebase has
> > support for IExternalIndex and IIndexibleExternalDataSource, so
> presumably
> > I can fetch records from an external data source (base table scan as well
> > as index).
> >
> > Can I write to an external data source ?
> >
> > 4) Hyracks runtime : For data retrieval, is it sufficient to implement
> the
> > interfaces within asterix.external.api or does one also have to add some
> > Hyracks operators which are constructed via contributeRuntimeOperator ?
> >
> > -Sandeep
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message