drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rahul challapalli <challapallira...@gmail.com>
Subject Re: Lucene Format Plugin
Date Mon, 03 Aug 2015 19:21:36 GMT
Thanks Jason.

I want to look at the solr plugin and see where we can collaborate or if we
already duplicated part of the effort.

I still need to push a few commits. I will share the code once I get these
changes pushed.

- Rahul



On Mon, Aug 3, 2015 at 11:31 AM, Jason Altekruse <altekrusejason@gmail.com>
wrote:

> Hey Rahul,
>
> This is really cool! Thanks for all of the time you put into writing this,
> I think we have a lot of available opportunities to reach new communities
> with efforts like this.
>
> I noticed last week another contributor opened a JIRA for a solr plugin,
> there might be a good opportunity for the two of you to join efforts, as I
> believe he likely stated working on a lucene reader as part of his solr
> work.
>
> Would you like to post a link to your work on Github or another public host
> of your code?
>
> https://issues.apache.org/jira/browse/DRILL-3585
>
> On Mon, Aug 3, 2015 at 2:29 AM, Stefán Baxter <stefan@activitystream.com>
> wrote:
>
> > Hi,
> >
> > I'm pretty new around here but I just wanted to tell you how much your
> work
> > can benefit us. This is great!.
> >
> > Look forward to trying it out.
> >
> > Regards,
> >  -Stefán
> >
> > On Mon, Aug 3, 2015 at 8:38 AM, rahul challapalli <
> > challapallirahul@gmail.com> wrote:
> >
> > > Hello Drillers,
> > >
> > > I have been working on a lucene format plugin. In its current state,
> the
> > > below sample query successfully searches a lucene index and returns the
> > > results.
> > >
> > > select path from dfs_test.`/search-index` where
> > contents='maxItemsPerBlock'
> > > and contents = 'BlockTreeTermsIndex'
> > >
> > >
> > >
> > > *High Level Overview of Current Implementation:*
> > >
> > > *Parallelization:* A lucene segment is the lowest level of
> > > parrallelization.
> > > *Filter Pushdown:* Currently the format plugin is designed to push the
> > > complete filter into the scan.
> > > *Filter Evaluation:* Each condition in the filter is treated as a
> lucene
> > > TermQuery
> > > <
> > >
> >
> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/TermQuery.html
> > > >
> > > and multiple conditions are joined using a BooleanQuery
> > > <
> > >
> >
> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/BooleanQuery.html
> > > >.
> > > If we *do not* use a TermQuery, then we have to know the exact type of
> > > Analyzer
> > > <
> > >
> >
> https://lucene.apache.org/core/5_2_1/core/org/apache/lucene/analysis/Analyzer.html
> > > >
> > > to use with each field in the query.
> > >     Ex: 'contents' field might have been analyzed using a
> > StandardAnalyzer
> > > <
> > >
> >
> https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html
> > > >
> > > and the 'path' field might not have been analyzed at all.
> > > If desired, support for raw lucene queries with a reserved word should
> be
> > > easy to add.
> > >     Ex: select * from dfs.`search-index` where searchQuery =
> > > "+contents:maxItemsPerBlock
> > > +path:/home/file.txt";
> > > *Converting SqlFilter to Lucene Query:* Currently only "=" and "!="
> > > operators are handled while converting a sql filter into a lucene
> query.
> > > For indexed fields this might be sufficient to handle a good number of
> > > cases. For non-indexed fields operators like ">,<, like etc" need to
be
> > > handled.
> > > *FileSystems:* Currently the format plugin only works on a local
> > > filesystem.
> > >
> > >
> > > Though far from complete, I want to work with the community to get some
> > > feedback and avoid any chance of duplication of work. Kindly let me
> know
> > > your thoughts
> > >
> > > - Rahul
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message