oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aditya Dhulipala <adhul...@usc.edu>
Subject Re: GSoC 2015
Date Wed, 25 Feb 2015 08:34:40 GMT
Hi Lewis,

I was going through the path you posted earlier. OODT- 658
https://issues.apache.org/jira/browse/OODT-658

I think this is a substantial part of the project we're currently talking
about (XML-RPC overhaul). My understanding is that this patch was
implemented to make Apache Gora communicate with OODT, so that's why you've
implemented the schema definitions for all the data structures used by
OODT. Gora generates some statically typed code from this schema and the
next step is to implement OODT logic to store the data in Gora (as opposed
to MySQL or Solr)

So from the viewpoint of the project we're talking about i.e. Replacing
XML-RPC with Avro,
the schema definition part is pretty much done (or almost done? Need to
define it within OODT as well?).
The next step would be to define RPC logic for the client server
communication within OODT itself i.e. within filemgr, workflowmgr etc.

Am I correct in understanding this?

Thanks!

--
Aditya





adi

On Mon, Feb 23, 2015 at 9:44 AM, Aditya Dhulipala <adhulipa@usc.edu> wrote:

> Hi Lewis,
>
> Yes. Understood.
>
> I did go throught it once before while researching on Avro. I'll make sure
> to read it again.
>
> It is very well written. Very detailed.
>
> Thanks!
>
> --
> Aditya
>
> adi
>
>
>
> On Mon, Feb 23, 2015 at 6:50 AM -0800, "Lewis John Mcgibbney" <
> lewis.mcgibbney@gmail.com> wrote:
>
>  Good Morning,
>> Sounds good to me.
>> Please make sure to read through Martins commentary over the years it is
>> very comprehensive.
>> I'll look forward to seeing your proposal soon.
>> Thank you
>> Lewis
>>
>> On Monday, February 23, 2015, Aditya Dhulipala <adhulipa@usc.edu> wrote:
>>
>>> Hi Lewis,
>>>
>>> No problem for the delay. Thanks for your reply!
>>>
>>> About the choosing avro over protobuf/thrift --  Ok. I understand. That
>>> (using a well founded apache project) makes sense over something deployed
>>> by another org. Also, the one thing Avro has --in terms of schema
>>> definition being part of the message-- also seems more advantageous over
>>> any of the other protocols.
>>>
>>> I understand your arguments against XML. By moving to Avro, we're not
>>> only
>>> eliminating difficulties in XML parsing etc, we're also getting schema
>>> definitions as part of the client-server exchange instead of having to
>>> generate XSD (supposing the existing OODT impl had that feature as well).
>>> So this is doubly advantageous i.e. we move to JSON (and lighter to
>>> parse)
>>> and also get XSD-type schema definitions.
>>> Please correct me if I'm wrong.
>>>
>>> Yes. This project sounds more and more exciting the more I learn about
>>> it!
>>> Plus, the impact, as you say, it would have is also motivating to take it
>>> up :)
>>>
>>> I'll begin to work on the proposal. Or at least a first version. I will
>>> have a draft ready by Wednesday.
>>>
>>> I'll continue to look into the code. Probably look into some more avro
>>> specifc stuff as well.
>>>
>>> About picking up OODT issues - -Understood. I'll do that as well
>>>
>>> Thanks for all the help!
>>>
>>> best
>>> --
>>> aditya
>>>
>>>
>>> adi
>>>
>>> On Sun, Feb 22, 2015 at 8:17 PM, Lewis John Mcgibbney <
>>> lewis.mcgibbney@gmail.com> wrote:
>>>
>>> > Hi Aditya,
>>> > Apologies for delay on this one :(
>>> > Thank you for your patience. Please see my inline responses.
>>> >
>>> > On Tue, Feb 17, 2015 at 12:31 AM, Aditya Dhulipala <adhulipa@usc.edu>
>>> > wrote:
>>> >
>>> > > Hi Lewis,
>>> > >
>>> > > I've been reading up on the doc you provided earlier.
>>> > >
>>> >
>>> > Great
>>> >
>>> >
>>> > >
>>> > > I've made some progress. I've looked into the filemgr component and
>>> run a
>>> > > few commands to ingest files etc. I understand how it works now.
>>> > >
>>> >
>>> > Great
>>> >
>>> >
>>> > >
>>> > > About the potential workflow -- (This is just my initial
>>> understanding. I
>>> > > could be wrong about this, please correct me)
>>> > > I think I have to rewrite the entire component to conform to the avro
>>> > style
>>> > > specification. So this means, I need to define the scheme for all the
>>> > files
>>> > > inside filemanger/structs -- Product.java, ProductPage.java etc.
>>> > >
>>> >
>>> > Yes, this is correct. The main data struxtures are documented in Avro
>>> > specification format as per the patch I attached to OODT-685
>>> > https://issues.apache.org/jira/browse/OODT-658
>>> > Please check them out.
>>> > There is an issues here as the DataStrutures in filemgr are dependent
>>> upon
>>> > additional data structures, namely Metadata which is contained within
>>> the
>>> > OODT metadata package.
>>> >
>>> >
>>> > >
>>> > > I should define the schema for each of these similar to that
>>> specified
>>> > for
>>> > > "User" on this link -
>>> > >
>>> > >
>>> >
>>> http://avro.apache.org/docs/current/gettingstartedjava.html#Defining+a+schema
>>> > >
>>> >
>>> > Absolutely correct. Please see OODT-685
>>> >
>>> >
>>> > >
>>> > > Currently I think this piece of code (Product.java) constructs an xml
>>> > file
>>> > > for each product and so that the rpcClient can send it over the
>>> xml-rcp
>>> > > interface to the filemgr server.
>>> >
>>> >
>>> > Yes
>>> >
>>> >
>>> > > This project aims to redefine this process
>>> > > to send the data as a binary encoding (for smaller size, and thus
>>> smaller
>>> > > latency) by using the avro protocol.
>>> > >
>>> >
>>> > Yes this is correct. It reduces wire transfer as well as a more
>>> flexible
>>> > model for reading data which has been written by a particular writer.
>>> Avro
>>> > support schema evolution as well meaning that data does not need to be
>>> > static i nature if we consider it from the Avro point of view. This is
>>> > highly advantageous from a data archival and interoperability view.
>>> >
>>> >
>>> > >
>>> > > And then I should invoke the avro code generation tools from within
>>> > > org.apache...system.XmlRpcFileManagerClient (probably have to rewrite
>>> > this
>>> > > module to fit Avro client specification as well)
>>> > >
>>> >
>>> > ... probably yes. I would imagine that by the time this project is
>>> > finished, there will be absolutely no references to XML anywhere. It
>>> will
>>> > be entirely replaces by Avro Schema's (JSON)
>>> >
>>> >
>>> > >
>>> > > I should also make the XmlRpcFileManger (server) fit to the avro
>>> specific
>>> > > implementation of the server interface.
>>> > >
>>> >
>>> > Yes that is correct.
>>> >
>>> >
>>> > >
>>> > > I think this has to be repeated for all the components within oodt
>>> > > (workflow manager etc)
>>> > >
>>> >
>>> > Absolutely. All key services e.g FileMgr, Workflow and Resource.
>>> >
>>> >
>>> > >
>>> > > I also have some questions:-
>>> > >
>>> > > 1. Is there any specific reason for picking Avro over Thrift or
>>> Protocol
>>> > > Buffers?
>>> > >
>>> >
>>> > Please read upon some of Martin Kleppmann's blogs and commentary over
>>> the
>>> > years on this topic
>>> >
>>> >
>>> http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
>>> > He did a bunch of work on Avro whilst @LinkedIn and it will really
>>> help you
>>> > to read through some of his work.
>>> >
>>> >
>>> > > 2. I also came across this answer on quora on Avro vs. XML-RPC
>>> > >
>>> > >
>>> >
>>> http://www.quora.com/What-merits-does-Avro-RPC-have-over-XML-RPC/answer/Ted-Dunning-1?__snids__=959769040&__nsrc__=1&__filter__=all
>>> > >
>>> > > The author talks about another binary format - Simple Binary
>>> Encoding.
>>> > And
>>> > > recommends using protocol buffers for their wide use and
>>> documentation.
>>> > Can
>>> > > you share your thoughts about this?
>>> > >
>>> >
>>> > I can yes.
>>> >  - Protocol Buffers is described as Google's Interchange format. Does
>>> this
>>> > not sound a bit limiting? What happens if you want to change some of
>>> the
>>> > code to fit into OODT. Are you going to fork the project and maintain
>>> your
>>> > own Protocol Buffers implementation.
>>> >  - @Apache there is a saying EAT YOUR OWN DOG FOOD. I would much
>>> rather we
>>> > implement a well founded Apache project e.g. Avro over Protocl Buffers
>>> any
>>> > day of the week.
>>> > Avro is also widely used. It also has a pretty excellent specification
>>> > document which as you've already seen has enabled you to understand
>>> schema
>>> > design.
>>> > ...
>>> >
>>> >
>>> > >
>>> > > I'd also like to run some more examples of the filemgr client/server.
>>> > That
>>> > > way I can run some commands like these
>>> > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/OODT/Exploring+the+OODT+File+Manager+XML-RPC+Interface
>>> > > and understand the overhead caused by xml-rpc or get a sense of what
>>> the
>>> > > latency of using xml-rcp is.
>>> >
>>> >
>>> > My main justification for moving towards a replacement for XML-RPC in
>>> OODT
>>> > is multi-faceted
>>> >  -  the library is dated,
>>> >  - the plethora of XML in OODT is cumbersome,
>>> >  - none of the XML is accompanied by XSD
>>> >  - Avro has advanced significantly over the years and I am more
>>> familiar
>>> > with it than I am other data serialization frameworks out there. It
>>> defines
>>> > the Protocol layer which is a natural replacement for the XML-RPC
>>> >  - the Google Summer of Code project we are describing here is carving
>>> the
>>> > way for a complete Avro-RPC powered REST API for each OODT service.
>>> This is
>>> > a HUGE game changer for invoking remote OODT services.
>>> >
>>> >
>>> > > Can you also share examples of filemgr servers
>>> > > running in the real-world that I could query or use?
>>> > >
>>> >
>>> > Most of the servers I am aware that are running are on VPN's and
>>> internal,
>>> > secure networks so the short answer is no.
>>> > This is something which we we get established once you were brought on
>>> as
>>> > the GSoC student for this project I would think.
>>> >
>>> >
>>> > >
>>> > > Any other comments/suggestions are welcome! :)
>>> > >
>>> > >
>>> > I would state that it would be really nice for you to put some of this
>>> > correspondence down to a proposal of sorts. You will require a working
>>> > proposal when you apply to Google.
>>> > Also, please feel free, if you have time, to pick up some issues on the
>>> > OODT Jira tracker. This will go a LONG way to us backing you as the
>>> > preferred GSoC applicant.
>>> > Thank you
>>> > Lewis
>>> >
>>>
>>
>>
>> --
>> *Lewis*
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message