drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefán Baxter <ste...@activitystream.com>
Subject Re: Lucene Format Plugin
Date Mon, 03 Aug 2015 09:29:43 GMT
Hi,

I'm pretty new around here but I just wanted to tell you how much your work
can benefit us. This is great!.

Look forward to trying it out.

Regards,
 -Stefán

On Mon, Aug 3, 2015 at 8:38 AM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Hello Drillers,
>
> I have been working on a lucene format plugin. In its current state, the
> below sample query successfully searches a lucene index and returns the
> results.
>
> select path from dfs_test.`/search-index` where contents='maxItemsPerBlock'
> and contents = 'BlockTreeTermsIndex'
>
>
>
> *High Level Overview of Current Implementation:*
>
> *Parallelization:* A lucene segment is the lowest level of
> parrallelization.
> *Filter Pushdown:* Currently the format plugin is designed to push the
> complete filter into the scan.
> *Filter Evaluation:* Each condition in the filter is treated as a lucene
> TermQuery
> <
> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/TermQuery.html
> >
> and multiple conditions are joined using a BooleanQuery
> <
> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/BooleanQuery.html
> >.
> If we *do not* use a TermQuery, then we have to know the exact type of
> Analyzer
> <
> https://lucene.apache.org/core/5_2_1/core/org/apache/lucene/analysis/Analyzer.html
> >
> to use with each field in the query.
>     Ex: 'contents' field might have been analyzed using a StandardAnalyzer
> <
> https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html
> >
> and the 'path' field might not have been analyzed at all.
> If desired, support for raw lucene queries with a reserved word should be
> easy to add.
>     Ex: select * from dfs.`search-index` where searchQuery =
> "+contents:maxItemsPerBlock
> +path:/home/file.txt";
> *Converting SqlFilter to Lucene Query:* Currently only "=" and "!="
> operators are handled while converting a sql filter into a lucene query.
> For indexed fields this might be sufficient to handle a good number of
> cases. For non-indexed fields operators like ">,<, like etc" need to be
> handled.
> *FileSystems:* Currently the format plugin only works on a local
> filesystem.
>
>
> Though far from complete, I want to work with the community to get some
> feedback and avoid any chance of duplication of work. Kindly let me know
> your thoughts
>
> - Rahul
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message