oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: GSoC 2015
Date Thu, 26 Feb 2015 18:32:23 GMT
Hi Adi,
Please see DRAT for a flagship application which displays OODT
https://github.com/chrismattmann/drat

On Wed, Feb 25, 2015 at 9:54 PM, Aditya Dhulipala <adhulipa@usc.edu> wrote:

> Hi professor,
>
> Thanks for your support!
>
> I saw that a very large portion of the code was written by you. I'm
> guessing I will be interacting with you a lot on this (assuming my
> application for GSoC goes through)
>
> I've posted a link to a first-version of the project proposal in the above
> email. I'd like to get some feedback on it so that I can polish it.
>
> Currently, I've written about the general overview of the project and a
> broad description of the tasks. I'm still trying to get comfortable with
> the codebase and try to come up with schedule of work, milestones &
> deliverables, etc. Can you have a look at the proposal and let me know what
> you think about it?
>
> Also, can you give me some pointers on how to use OODT on some dataset. I
> think I may have the employment dataset from 572 last semester. Can you
> give me some ideas on how to use it with OODT just to get a sense of how
> OODT works etc..
>
> Here's one I'm thinking of:-
> I think OODT is useful for storing, indexing structured data - through the
> use of metadata files (.met) and indexing based on this to answer queries.
> For unstructured data Lucene/Solr is great tool. But the employment dataset
> is not completely unstructured nor does it have consistent structure. Its
> slightly structured in the sense that we can guess what fields are there in
> records, and move forward from there, right?
> So far the only use case I can think of, is using OODT to crawl the dataset
> and push it into solr. Then query OODT through the cmd line (like in the
> oodt wiki examples) i.e. using solr syntax of sql syntax.
>
> Is this a valid use case for OODT? I think people would rather just query
> solr directly right? is there any reason for OODT to act as an in-between?
>
> Any more ideas, comments, suggestions?
>
> Thanks!
>
> --
> Aditya
>
>
> adi
>
> On Wed, Feb 25, 2015 at 7:19 PM, Chris Mattmann <chris.mattmann@gmail.com>
> wrote:
>
> > This sounds fabulous. I will be keen to help.
> >
> > ------------------------
> > Chris Mattmann
> > chris.mattmann@gmail.com
> >
> >
> >
> >
> > -----Original Message-----
> > From: Lewis John Mcgibbney <lewis.mcgibbney@gmail.com>
> > Reply-To: <dev@oodt.apache.org>
> > Date: Wednesday, February 25, 2015 at 1:54 PM
> > To: Aditya Dhulipala <adhulipa@usc.edu>
> > Cc: "dev@oodt.apache.org" <dev@oodt.apache.org>
> > Subject: Re: GSoC 2015
> >
> > >I think that you should aim to implement it on all components and we
> > >should
> > >be looking to merge to code into OODT (branch) incrementally.
> > >It is OK that you may not get every component ported to Avro RPC, what
> is
> > >impoirtant is that there is an optimistic but realistic GSoC put
> forward.
> > >That is what we are looking for.
> > >Thank you
> > >LEwis
> > >
> > >
> > >On Wed, Feb 25, 2015 at 1:48 PM, Aditya Dhulipala <adhulipa@usc.edu>
> > >wrote:
> > >
> > >> Hi Lewis,
> > >>
> > >> Thanks for your reply!
> > >>
> > >> Your responses have helped immensely when I'm stuck on something!
> > >>
> > >> In the proposal that I was preparing I had listed out all the
> components
> > >> that would require schema definitions and then when I checked the OODT
> > >> patch 658, I realized that a lot of this was done for the Gora
> project.
> > >>But
> > >> your email has clarified that I can use that as a starting point for
> the
> > >> Avro project. This is extremely useful
> > >>
> > >> And thanks for the rest of the info as well (about ensuring backwards
> > >> compaitibility, testing, regression testing).. Now I have a much
> better
> > >> idea of formulating a proposal (and the project to-dos also).
> > >>
> > >> I'll will have it ready ASAP. I will post it to the group by end of
> > >>today
> > >> so that I can get more feedback on it
> > >>
> > >> I think I should at least be able to define Avro RPC implementations
> for
> > >> one of the components of OODT in the GSoC duration, right?
> > >> Define the schema
> > >> Implement the services
> > >> Write unit tests
> > >> Regression test against XML-RPC
> > >>
> > >> Hopefully I should implement it for more than one component, but I'm
> > >>still
> > >> no able to estimate the workload. I'll continue reading up on this
> > >>
> > >> I'll continue to work on the proposal and keep you updated.
> > >>
> > >> Thanks for all the help!.. I think if I start early, then I can spend
> > >>the
> > >> summer coding from the begining..
> > >>
> > >> Thanks!
> > >>
> > >> --
> > >> Aditya
> > >>
> > >>
> > >> adi
> > >>
> > >> On Wed, Feb 25, 2015 at 9:34 AM, Lewis John Mcgibbney <
> > >> lewis.mcgibbney@gmail.com> wrote:
> > >>
> > >>> Hi Adi,
> > >>>
> > >>> On Wed, Feb 25, 2015 at 12:34 AM, Aditya Dhulipala <adhulipa@usc.edu
> >
> > >>> wrote:
> > >>>
> > >>>> Hi Lewis,
> > >>>>
> > >>>> I was going through the path you posted earlier. OODT- 658
> > >>>> https://issues.apache.org/jira/browse/OODT-658
> > >>>>
> > >>>> I think this is a substantial part of the project we're currently
> > >>>> talking about (XML-RPC overhaul).
> > >>>>
> > >>>
> > >>> Substantial may be a wee bit optimistic ;) But yes a significant
> > >>>portion
> > >>> of thinking in to the OODT data structures logic has been done. We
DO
> > >>>need
> > >>> to implement Metadata in exactly the right way without loosing
> existing
> > >>> functionality so please begin to think about that.
> > >>>
> > >>>
> > >>>
> > https://github.com/apache/oodt/blob/trunk/metadata/src/main/java/org/apa
> > >>>che/oodt/cas/metadata/Metadata.java
> > >>>
> > >>>
> > >>>> My understanding is that this patch was implemented to make Apache
> > >>>>Gora
> > >>>> communicate with OODT, so that's why you've implemented the schema
> > >>>> definitions for all the data structures used by OODT.
> > >>>>
> > >>>
> > >>> Correct
> > >>>
> > >>>
> > >>>> Gora generates some statically typed code from this schema
> > >>>>
> > >>>
> > >>> Using the GoraCompiler
> > >>> http://gora.apache.org/current/compiler.html,
> > >>> invoked via CompilerCLI
> > >>>
> > >>>
> > >>>
> > >>>> and the next step is to implement OODT logic to store the data
in
> Gora
> > >>>> (as opposed to MySQL or Solr)
> > >>>>
> > >>>
> > >>> YES. This will tidy A LOT of the current configuration up. Will also
> > >>>have
> > >>> a unified and well documented manner for configuring the mappings and
> > >>> datastore specific configuration. All of the Gora datastores are
> > >>>documented
> > >>> here
> > >>> http://gora.apache.org/current/index.html
> > >>> I've been hacking away on documentation for Gora for about a year so
> it
> > >>> is now relatively OK. I hope you find it useful.
> > >>>
> > >>>
> > >>>>
> > >>>> So from the viewpoint of the project we're talking about i.e.
> > >>>>Replacing
> > >>>> XML-RPC with Avro,
> > >>>> the schema definition part is pretty much done (or almost done?
Need
> > >>>>to
> > >>>> define it within OODT as well?).
> > >>>>
> > >>>
> > >>> Note, that NONE of the Avro RPC logic is implemented. So it is
> nowhere
> > >>> nearly done ;) The core project definition is still to be addressed
> > >>>and I
> > >>> am nearly 100% sure that we will have some trciky issues to address
> > >>> regarding 1) maintaining as close to backwards compatability as
> > >>>possible 2)
> > >>> documenting the entire Avro RPC communications within OODT, 3)
> Hooking
> > >>>up
> > >>> all services, 4) Testing the new implementation, 5) regression
> testing
> > >>>it
> > >>> against the existing XML-RPC layer, 6) setting a roadmap fro
> > >>>deprecation
> > >>> and eventual removal of the XML-RPC material
> > >>>
> > >>>
> > >>>> The next step would be to define RPC logic for the client server
> > >>>> communication within OODT itself i.e. within filemgr, workflowmgr
> etc.
> > >>>>
> > >>>
> > >>> Correct, this should make up the majority of your proposal OK.
> > >>>
> > >>>
> > >>>>
> > >>>> Am I correct in understanding this?
> > >>>>
> > >>>>
> > >>>> Yes and thank you for joining the dots, it is nice to see a student
> > >>> interpreting and investigating the problem this much prior to the
> > >>>project
> > >>> starting. I am really looking forward to this now.
> > >>> Thanks
> > >>> LEwis
> > >>>
> > >>
> > >>
> > >
> > >
> > >--
> > >*Lewis*
> >
> >
> >
>



-- 
*Lewis*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message