impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shant Hovsepian <>
Subject Re: Removing "external data source"
Date Wed, 07 Feb 2018 17:08:26 GMT
My two cents.

Haved used it for testing and prototyping little things, for example a
twitter firehouse datasource, or even a generic JDBC wrapper, makes cool
demos but not something one would use in for data intensive workloads. It
definitely has issues like defining and extracting a schema is tedious, it
does not parallelize but that is generally a hard problem. I do think it
would be cool to document better and see if the community would come up
with fun datasources. It's one feature that SparkSQL and Drill kind of do
well that I'd wish to see better support in Impala for. If it is not too
much overhead to maintain might be worth keeping.

On Wed, Feb 7, 2018 at 8:48 AM Daniel Hecht <> wrote:

> As it is implemented today, it doesn't have much value. It never really
> passed the prototype stage in terms of functionality.  For instance, it's
> not parallelized -- it runs on a single node only.
> On Tue, Feb 6, 2018 at 8:47 PM, Jim Apple <> wrote:
>> Is there an argument for documenting it and keeping it? Did it not meet
>> the need it was added for in the first place, or has that need deceased in
>> importance?
>> On Tue, Feb 6, 2018 at 7:29 PM Philip Zeyliger <>
>> wrote:
>>> Hi folks,
>>> I want to bring your attention to,
>>> "IMPALA-6204: Remove external DataSource". This is functionality that was
>>> never publicly documented and, to my knowledge, is not in use by anyone.
>>> We'd like to remove it to reduce complexity.
>>> Please let me know if you've got concerns!
>>> Thanks,
>>> -- Philip

View raw message