spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "james.green9@baesystems.com" <james.gre...@baesystems.com>
Subject new datasource
Date Thu, 19 Nov 2015 15:14:28 GMT


We have written a new Spark DataSource that uses both Parquet and ElasticSearch.  It is based
on the existing Parquet DataSource.   When I look at the filters being pushed down to buildScan
I don’t get anything representing any filters based on UDFs – or for any fields generated
by an explode – I had thought if I made it a CatalystScan I would get everything I needed.



This is fine from the Parquet point of view – but we are using ElasticSearch to index/filter
the data we are searching and I need to be able to capture the UDF conditions – or have
access to the Plan AST in order that I can construct a query for ElasticSearch.



I am thinking I might just need to patch Spark to do this – but I’d prefer not too if
there is a way of getting round this without hacking the core code.  Any ideas?



Thanks



James



Please consider the environment before printing this email. This message should be regarded
as confidential. If you have received this email in error please notify the sender and destroy
it immediately. Statements of intent shall only become binding when confirmed in hard copy
by an authorised signatory. The contents of this email may relate to dealings with other companies
under the control of BAE Systems Applied Intelligence Limited, details of which can be found
at http://www.baesystems.com/Businesses/index.htm.
Mime
View raw message