Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A300C180EB for ; Mon, 3 Aug 2015 19:22:57 +0000 (UTC) Received: (qmail 98264 invoked by uid 500); 3 Aug 2015 19:22:57 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 98210 invoked by uid 500); 3 Aug 2015 19:22:57 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 98197 invoked by uid 99); 3 Aug 2015 19:22:57 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Aug 2015 19:22:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 996E11A95C3 for ; Mon, 3 Aug 2015 19:22:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id C9x_no9mlw8v for ; Mon, 3 Aug 2015 19:22:42 +0000 (UTC) Received: from mail-la0-f44.google.com (mail-la0-f44.google.com [209.85.215.44]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 665692D769 for ; Mon, 3 Aug 2015 19:22:41 +0000 (UTC) Received: by labix3 with SMTP id ix3so6968849lab.0 for ; Mon, 03 Aug 2015 12:21:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=vaRVU450R3la6TGT5kPqeq1iWNWc6l94SOxqj/je3O4=; b=WiouUFUPKy+h8Yw+nNc6l25++MIt+f7iCJFqL2MePxc5DlEJcbY7szIosGQXcPy1k9 Z0xi1My7MW29auQr4v2qFMysyVED2htsX9aeH2aSpevUWrgv1M/e5GL3wfuxdF7PtRfk aiyFB3od61Dg751ertezWm6oeZF9n+mG3LqMa4p8kjEQt8j44CN56HasWkt6rco4K/gC DLkaGiSSQi/pVDf6Bg1LY0QsRX3gmGUJZilcNWqrc6bbRlECsuzSF/YFiIaD87aW1ZeW p6kiS6HqWTXMxYWmlS8mOESTo+vwPOTA/XKfzIULoScAS7W28r6Mo2FNYheUK5QMXQVV wEJA== X-Received: by 10.152.42.170 with SMTP id p10mr5249671lal.39.1438629715869; Mon, 03 Aug 2015 12:21:55 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.96.201 with HTTP; Mon, 3 Aug 2015 12:21:36 -0700 (PDT) In-Reply-To: References: From: rahul challapalli Date: Mon, 3 Aug 2015 12:21:36 -0700 Message-ID: Subject: Re: Lucene Format Plugin To: dev@drill.apache.org Content-Type: multipart/alternative; boundary=001a11c34c782d8686051c6d153b --001a11c34c782d8686051c6d153b Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks Jason. I want to look at the solr plugin and see where we can collaborate or if we already duplicated part of the effort. I still need to push a few commits. I will share the code once I get these changes pushed. - Rahul On Mon, Aug 3, 2015 at 11:31 AM, Jason Altekruse wrote: > Hey Rahul, > > This is really cool! Thanks for all of the time you put into writing this= , > I think we have a lot of available opportunities to reach new communities > with efforts like this. > > I noticed last week another contributor opened a JIRA for a solr plugin, > there might be a good opportunity for the two of you to join efforts, as = I > believe he likely stated working on a lucene reader as part of his solr > work. > > Would you like to post a link to your work on Github or another public ho= st > of your code? > > https://issues.apache.org/jira/browse/DRILL-3585 > > On Mon, Aug 3, 2015 at 2:29 AM, Stef=C3=A1n Baxter > wrote: > > > Hi, > > > > I'm pretty new around here but I just wanted to tell you how much your > work > > can benefit us. This is great!. > > > > Look forward to trying it out. > > > > Regards, > > -Stef=C3=A1n > > > > On Mon, Aug 3, 2015 at 8:38 AM, rahul challapalli < > > challapallirahul@gmail.com> wrote: > > > > > Hello Drillers, > > > > > > I have been working on a lucene format plugin. In its current state, > the > > > below sample query successfully searches a lucene index and returns t= he > > > results. > > > > > > select path from dfs_test.`/search-index` where > > contents=3D'maxItemsPerBlock' > > > and contents =3D 'BlockTreeTermsIndex' > > > > > > > > > > > > *High Level Overview of Current Implementation:* > > > > > > *Parallelization:* A lucene segment is the lowest level of > > > parrallelization. > > > *Filter Pushdown:* Currently the format plugin is designed to push th= e > > > complete filter into the scan. > > > *Filter Evaluation:* Each condition in the filter is treated as a > lucene > > > TermQuery > > > < > > > > > > http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/TermQue= ry.html > > > > > > > and multiple conditions are joined using a BooleanQuery > > > < > > > > > > http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Boolean= Query.html > > > >. > > > If we *do not* use a TermQuery, then we have to know the exact type o= f > > > Analyzer > > > < > > > > > > https://lucene.apache.org/core/5_2_1/core/org/apache/lucene/analysis/Anal= yzer.html > > > > > > > to use with each field in the query. > > > Ex: 'contents' field might have been analyzed using a > > StandardAnalyzer > > > < > > > > > > https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/lucene/a= nalysis/standard/StandardAnalyzer.html > > > > > > > and the 'path' field might not have been analyzed at all. > > > If desired, support for raw lucene queries with a reserved word shoul= d > be > > > easy to add. > > > Ex: select * from dfs.`search-index` where searchQuery =3D > > > "+contents:maxItemsPerBlock > > > +path:/home/file.txt"; > > > *Converting SqlFilter to Lucene Query:* Currently only "=3D" and "!= =3D" > > > operators are handled while converting a sql filter into a lucene > query. > > > For indexed fields this might be sufficient to handle a good number o= f > > > cases. For non-indexed fields operators like ">,<, like etc" need to = be > > > handled. > > > *FileSystems:* Currently the format plugin only works on a local > > > filesystem. > > > > > > > > > Though far from complete, I want to work with the community to get so= me > > > feedback and avoid any chance of duplication of work. Kindly let me > know > > > your thoughts > > > > > > - Rahul > > > > > > --001a11c34c782d8686051c6d153b--