incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: 0.2.0-newtypesystem branch
Date Fri, 19 Jul 2013 12:16:56 GMT
The type system is probably the largest change to Blur since we moved to
Lucene 4.  So I know it's a bit of a moving target.  Any contribution is
significant, just talking about the tasks on jira is very helpful.  So if
you see any tasks that catch your eye in jira, I am more than happy to help
you work on them.  Thanks again!

Aaron


On Thu, Jul 18, 2013 at 7:33 PM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Hi Aaron,
>
> Thanks for the clarification. As for me I really haven't made any progress
> and the direction I worked was to store the analyzer definition in
> zookeeper in a different way and some minor changes to the thrift interface
> (type, stricttypes) and generated the code. I am not able to find time to
> contribute anything significant. If there is anything specific you want me
> to take up, do let me know.
>
> - Rahul
>
>
> On Wed, Jul 17, 2013 at 5:00 PM, Aaron McCurry <amccurry@gmail.com> wrote:
>
> > Rahul,
> >
> > After giving it some thought I think we should store all the meta data
> > about tables in hdfs. Let explain why.  We have run into issues where my
> > project wants to remove a table from blur but not delete the indexes
> (maybe
> > because it's a test system with multiple versions of the same data).
> > However the problem is you can't just import the table back into blur
> > because the column definitions are stored in zookeeper and they have
> > already been destroyed.
> >
> > That's why I have implemented a hdfs field manager that stores the meta
> > data in hdfs. I don't really have a feel yet how well this will work but
> > the basic way the base field manager is implemented is it storage
> agnostic.
> > So any implementing sub class has to implement a way to store and load
> > column definitions.
> >
> > Further because the model we are implementing is write a col def once per
> > field and never modify. I think this will fit well within hdfs's
> > capabilities. Because hdfs enforces atomic file creation so no 2 nodes
> can
> > create the same column definition at least with the way I have
> implemented
> > it.
> >
> > Take a look at what's there and let me know what you think. Thanks!
> >
> > Aaron
> >
> > Sent from my iPad
> >
> > On Jul 17, 2013, at 11:53 AM, rahul challapalli <
> > challapallirahul@gmail.com> wrote:
> >
> > > Hi Aaron,
> > >
> > > Can you elaborate on your thoughts about how to store the Analyzer
> > > Definition in zookeeper?
> > >
> > > Below example is from my notes in the past. Let me know what you think
> > >
> > > /blur/default/tables/words/default-column-definition : value
> > >
> > >
> >
> /blur/default/tables/words/column-families/fam1/default-column-definition :
> > > value
> > >
> > > /blur/default/tables/words/column-families/fam1/col1 : value
> > >
> > > /blur/default/tables/words/column-families/fam1/col2 : value
> > >
> > >
> > > - Rahul
> > >
> > >
> > > On Tue, Jul 16, 2013 at 6:06 PM, Aaron McCurry <amccurry@gmail.com>
> > wrote:
> > >
> > >> On Tue, Jul 16, 2013 at 1:24 AM, rahul challapalli <
> > >> challapallirahul@gmail.com> wrote:
> > >>
> > >>> Hi Aaron,
> > >>>
> > >>> I started looking into the functionality you already added. A few
> > >>> observations :
> > >>>
> > >>> In the Blur.thrift file, AnalyzerDefinition is removed from the
> > >>> TableDescriptor. Was this intentional? If so can you give us an
> example
> > >> of
> > >>> how to use them?
> > >>>
> > >>
> > >> Removing the AnalyzerDefinition was intentional.  The motivation there
> > is
> > >> to allow the schema (Families,Columns,and Types) to be set/added
> > >> independently of the creation of the table.  I have not created any
> new
> > >> thrift rpc calls to add new column definitions but ultimately it will
> > >> call addColumnDefinition
> > >> on the FieldManager class.
> > >>
> > >>
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur-
> > >>
> > >>
> >
> query/src/main/java/org/apache/blur/analysis/FieldManager.java;h=2271726e55bb9356ca6f2b6edf7a5fdec46b36c4;hb=ae516a442767b31d2c7e29b07a78aa08ec246dcf<
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur-
> > >>
> >
> query/src/main/java/org/apache/blur/analysis/FieldManager.java;h=2271726e55bb9356ca6f2b6edf7a5fdec46b36c4;hb=ae516a442767b31d2c7e29b07a78aa08ec246dcf>
> > >>
> > >>
> > >>> I modified the Blur.thrift(Column and TableDescriptor) and generated
> > the
> > >>> code. I don't know how to handle scenarios where minor changes are
> made
> > >> and
> > >>> need to be pushed into the branch. Otherwise it becomes a big commit
> if
> > >> we
> > >>> try to associate with a specific JIRA ticket?
> > >>>
> > >>
> > >> I think that you should attach a patch to the jira ticket.  I can
> review
> > >> and merge then we can work from the same baseline.  Then we can repeat
> > that
> > >> process as many times as needed.
> > >>
> > >>
> > >>>
> > >>> I added a bunch of code to the MutationHelper class to validate
> > in-bound
> > >>> columns. Can you check whether my understanding is aligned with the
> > >>> requirement?
> > >>
> > >>
> > >>> public static Column validateColumn(String family, Column col,
> > >>> booleanstrict, FieldManager fieldManager) {
> > >>>
> > >>> if (strict == true) {
> > >>>
> > >>>  if (col.type == null) {
> > >>>
> > >>>    throw new RuntimeException("The type of the column is a required
> > >> field
> > >>> for this table. To turn off this behavior set strictTypes=false on
> the
> > >>> TableDesciptor");
> > >>>
> > >>>  }
> > >>>
> > >>> }
> > >>>
> > >>>
> > >>>
> > >>> FieldTypeDefinition fieldTypeDefinition =
> > >>> fieldManager.getFieldTypeDefinition(family + "." + col.name);
> > >>>
> > >>> if (fieldTypeDefinition == null) {
> > >>>
> > >>>  // TODO dynamic column : add new column definition
> > >>>
> > >>>    return col;
> > >>>
> > >>> }
> > >>>
> > >>> if (!fieldTypeDefinition.getName().equalsIgnoreCase(col.type)) {
> > >>>
> > >>>  throw new RuntimeException("The type defined in the column does not
> > >> match
> > >>> the existing type definition");
> > >>>
> > >>> }
> > >>>
> > >>> return col;
> > >>>
> > >>>  }
> > >>>
> > >>
> > >> Yes this looks good, but just an FYI I like to always throw
> > BlurExceptions
> > >> instead of RuntimeExceptions.  The main reason for this (across the
> > board)
> > >> is that Thrift will wrap all exceptions that are not BlurExceptions or
> > >> TExceptions in a TException.  When this happens that client thinks
> that
> > >> something went wrong with the connection and will retry the call over
> > >> several times.
> > >>
> > >> Thanks!
> > >>
> > >> Aaron
> > >>
> > >>>
> > >>>
> > >>> - Rahul
> > >>>
> > >>>
> > >>> On Tue, Jul 2, 2013 at 4:27 PM, Aaron McCurry <amccurry@gmail.com>
> > >> wrote:
> > >>>
> > >>>> I have created a new branch where I have been working on rewriting
> the
> > >>>> type/analyzer system for what seems like the 3rd or 4th time. 
So
> > >>> hopefully
> > >>>> it will turn out better this time.
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=shortlog;h=refs/heads/0.2.0-newtypesystem
> > >>>>
> > >>>> If you have a chance I would love some feedback on what's been
built
> > >> thus
> > >>>> far.
> > >>>>
> > >>>>
> > >>>> The o.a.b.analysis package in the blur-query project:
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=tree;f=blur-query/src/main/java/org/apache/blur/analysis;h=3db57e994d4e60cc81d94641482c69305767fab5;hb=4ebe74ef2e489d8a360220e0d2752c682042ab22
> > >>>>
> > >>>> And the o.a.b.analysis.type package in the blur-query project:
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=tree;f=blur-query/src/main/java/org/apache/blur/analysis/type;h=44ca6e1114210ffd8d202a29a347f7b77e37142f;hb=4ebe74ef2e489d8a360220e0d2752c682042ab22
> > >>>>
> > >>>> The main classes to start looking at are BaseFileManager and the
> > >>>> FieldTypeDefinition.  They will lead you to several implementations.
> > >> My
> > >>>> hope is that this API will allow us to support the given types
in
> > >> Lucene
> > >>> as
> > >>>> well as allowing other to create new FieldTypeDefinition(s) and
> extend
> > >>>> Blur.
> > >>>>
> > >>>> Let me know what you think.  Thanks!
> > >>>>
> > >>>> Aaron
> > >>>>
> > >>>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message