falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anuj kumar <anuj.gandh...@gmail.com>
Subject Re: Need help in understanding Lineage With Falcon
Date Wed, 15 Jul 2015 07:39:14 GMT
Thanks Srikanth for the quick response.
I understand that querying the Graph DB may not be a good solution after

But what I understood is that this Graph DB stores already the calculated
Lineage Information. Is this assumption correct?
If yes, is there also some data store that captures the metadata
So, for eg Lets assume I have a Hive Script that reads a file in HDFS,
concatenates the FirstName and LastName fields in the file and stores it
into a Hive Table as FullName field.

Now in order to generate Lineage on the FullName field, it needs to not
only know the HDFS File Name and the field Name but also the HIve Table
Name as well as column Name.

How does Falcon capture this metadata from the Hive Script? Does it parse
the Hive Script to understand the metadata? Also, where is this metadata
stored? Is it in HCatalog?

May be I have completely misunderstood it.Please correct me if I am wrong.

Anuj Kumar

On Wed, Jul 15, 2015 at 7:25 AM, Srikanth Sundarrajan <sriksun@hotmail.com>

> Hi Anuj,
>     Falcon stores lineage information in a graph store backed by a
> blue-print api (by default is stored on titan db). So if one understands
> the schema, one can query the graph, but we would not prefer anyone
> accessing these graphs directly as they are internal representation of
> falcon and are subject to change without any prior notice across releases.
> Graph related RESTApi's in Falcon are modeled based on Rexster apis (
> https://github.com/tinkerpop/rexster/wiki/Basic-REST-API). This should
> allow any standard graph query to run over them.
> More specifically direct apis are available in the form of entity lineage (
> http://falcon.apache.org/0.6.1/restapi/EntityDependencies.html &
> http://falcon.apache.org/0.6.1/restapi/EntityLineage.html)  and instance
> lineage (in trunk, pending release) for direct consumption without having
> to write custom queries.
> Regards
> Srikanth Sundarrajan
> > From: anuj.o.kumar@accenture.com
> > To: dev@falcon.apache.org
> > Subject: Need help in understanding Lineage With Falcon
> > Date: Tue, 14 Jul 2015 22:51:46 +0000
> >
> > Hi,
> >  I am working with a client that uses Informatica Metadata Manager to
> visualise Lineage Information. Informatica Metadata Manager is currently
> used at Data Warehouse layer and is proven effective.
> > But unfortunately Informatica Metadata Manage does not have any
> connectors to Hadoop to collect metadata information, which makes it not so
> desirable tool for the entire end to end chain. This is where Apache Falcon
> comes to the rescue.
> >
> > Looking at Falcon, I see that Falcon exposes a set of REST APIs that can
> be used to capture metadata information about process,feed and cluster
> entities (assuming that the workflow is scheduled using Apache Falcon). So
> we are exploring option on how we can actually generate metadata at Hadoop
> layer that can then be used to feed informatica Metadata Manager, which
> will combine it with its own metadata from DWH and Business reports to
> provide a complete Lineage information.
> >
> > I have three specific question with regard to the above problem :
> >
> >
> >   1.  Where is the Metadata Repository located for Apache Falcon? Is it
> the config store on Hadoop or Hcatalog ?
> >   2.  Is there a way to connect to this repository(for e.g.. via JDBC) ?
> >   3.  What set of REST APIs can be called from outside of the Falcon
> environment to capture the Metadata Information about the processes
> scheduled using Falcon ? I looked at these<
> http://falcon.apache.org/0.6.1/restapi/> set of REST APIs, which was a
> start for me, but I got lost in the details.
> >
> > Your quick answer would be really appreciated.
> >
> > Thanks,
> > Anuj Kumar
> > Technology Architect - Emerging Technology Innovation group
> > mobile: +31 6 30458915
> > ITO Toren - Gustav Mahlerplein 90 - 1082MA Amsterdam
> >              >
> > accenture
> >
> > ________________________________
> >
> > This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
> >
> ______________________________________________________________________________________
> >
> > www.accenture.com

*Anuj Kumar*

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message