Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1D5A918443 for ; Sat, 22 Aug 2015 07:09:47 +0000 (UTC) Received: (qmail 64113 invoked by uid 500); 22 Aug 2015 07:09:46 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 64058 invoked by uid 500); 22 Aug 2015 07:09:46 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 64041 invoked by uid 99); 22 Aug 2015 07:09:46 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Aug 2015 07:09:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 22B151AA9AE for ; Sat, 22 Aug 2015 07:09:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3 X-Spam-Level: *** X-Spam-Status: No, score=3 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id lFjrluiuxVJc for ; Sat, 22 Aug 2015 07:09:30 +0000 (UTC) Received: from mail-qg0-f49.google.com (mail-qg0-f49.google.com [209.85.192.49]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 27BC550782 for ; Sat, 22 Aug 2015 07:00:18 +0000 (UTC) Received: by qgj62 with SMTP id 62so59106951qgj.2 for ; Sat, 22 Aug 2015 00:00:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=1uGaaiHIG3HTpNPxIYK0Z7IhXFOgUEERbrsDx6U97nA=; b=RymosyYrGR9rPPz6VzUyIDvTsaOU7WshTBUb+Pu4DMVXDg8zS/J+oLuzDAQ0oq8wMd UHC0zaWNuGcjGWE4BfJC1A7Wjp3qPxi6DuK1JrCkaq1VEgBnNFw+EmJHVPNEEDVyz6Qj 2pf7rHEDo+ml5hMh93WPKm2BB+ce7bxV3cJmGiU6Q3RNsXZ09Y4kkjH/SZ3ziqAO6lza ssf0ra5EQW/01W3bK5sPwdUoBU/faB0lhbTyjOXDYeiJHUJBOWKrOl3uaUS9Gb8bXEAA UvXWvsLT2qEx0IVHJZmYW8Fim5o9u69qUH4KDndMqCPRa/s7a+SfvRMGEPtApABC46BG ImdQ== X-Gm-Message-State: ALoCoQncb0CMjGz9KCOxzGXxqsFTHbG+HZLksa5k8Zw7Qm6QnITSB675cxq6OFVQlNdN9tYxsE5Y MIME-Version: 1.0 X-Received: by 10.140.196.67 with SMTP id r64mr28441885qha.10.1440226812136; Sat, 22 Aug 2015 00:00:12 -0700 (PDT) Received: by 10.140.39.104 with HTTP; Sat, 22 Aug 2015 00:00:12 -0700 (PDT) In-Reply-To: References: Date: Sat, 22 Aug 2015 07:00:12 +0000 Message-ID: Subject: Re: Lucene Format Plugin From: =?UTF-8?Q?Stef=C3=A1n_Baxter?= To: dev Cc: challapallirahul@gmail.com Content-Type: multipart/alternative; boundary=001a11431d4a88afe2051de0ef8a --001a11431d4a88afe2051de0ef8a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Rahul, Can you elaborate a bit on the status of the Lucene plugin and what needs to be done before using it? Also let me know if there are specific things that need improving. We want to try to using it in our project and perhaps we can contribute something meaningful. Regards, -Stefan On Mon, Aug 10, 2015 at 5:01 AM, Sudip Mukherjee wrote: > Hi Rahul, > > Thanks for sharing your code. I was trying to get plugin for solr engine. > But I thought of using solr's rest api to do the queries ,get schema > metadata info etc. > The goal for me is to expose a solr engine to tools like Tableau or MS > Excel and user can do stuff there. > > I am still very new to this and there is a learning curve. It would be > great if you can comment/review whatever I've done so far. > > https://github.com/sudipmukherjee/drill/tree/master/contrib/storage-solr > > Thanks, > Sudip > > -----Original Message----- > From: rahul challapalli [mailto:challapallirahul@gmail.com] > Sent: 10 August 2015 AM 05:21 > To: dev@drill.apache.org > Subject: Re: Lucene Format Plugin > > Below is the link to my branch which contains the changes related to the > format plugin. > > https://github.com/rchallapalli/drill/tree/lucene/contrib/format-lucene > > Any thoughts on how to handle contributions like this which still have > some work to be done? > > - Rahul > > > On Mon, Aug 3, 2015 at 12:21 PM, rahul challapalli < > challapallirahul@gmail.com> wrote: > > > Thanks Jason. > > > > I want to look at the solr plugin and see where we can collaborate or > > if we already duplicated part of the effort. > > > > I still need to push a few commits. I will share the code once I get > > these changes pushed. > > > > - Rahul > > > > > > > > On Mon, Aug 3, 2015 at 11:31 AM, Jason Altekruse > > > > wrote: > > > >> Hey Rahul, > >> > >> This is really cool! Thanks for all of the time you put into writing > >> this, I think we have a lot of available opportunities to reach new > >> communities with efforts like this. > >> > >> I noticed last week another contributor opened a JIRA for a solr > >> plugin, there might be a good opportunity for the two of you to join > >> efforts, as I believe he likely stated working on a lucene reader as > >> part of his solr work. > >> > >> Would you like to post a link to your work on Github or another > >> public host of your code? > >> > >> https://issues.apache.org/jira/browse/DRILL-3585 > >> > >> On Mon, Aug 3, 2015 at 2:29 AM, Stef=C3=A1n Baxter > >> > >> wrote: > >> > >> > Hi, > >> > > >> > I'm pretty new around here but I just wanted to tell you how much > >> > your > >> work > >> > can benefit us. This is great!. > >> > > >> > Look forward to trying it out. > >> > > >> > Regards, > >> > -Stef=C3=A1n > >> > > >> > On Mon, Aug 3, 2015 at 8:38 AM, rahul challapalli < > >> > challapallirahul@gmail.com> wrote: > >> > > >> > > Hello Drillers, > >> > > > >> > > I have been working on a lucene format plugin. In its current > >> > > state, > >> the > >> > > below sample query successfully searches a lucene index and > >> > > returns > >> the > >> > > results. > >> > > > >> > > select path from dfs_test.`/search-index` where > >> > contents=3D'maxItemsPerBlock' > >> > > and contents =3D 'BlockTreeTermsIndex' > >> > > > >> > > > >> > > > >> > > *High Level Overview of Current Implementation:* > >> > > > >> > > *Parallelization:* A lucene segment is the lowest level of > >> > > parrallelization. > >> > > *Filter Pushdown:* Currently the format plugin is designed to > >> > > push the complete filter into the scan. > >> > > *Filter Evaluation:* Each condition in the filter is treated as a > >> lucene > >> > > TermQuery > >> > > < > >> > > > >> > > >> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Ter > >> mQuery.html > >> > > > > >> > > and multiple conditions are joined using a BooleanQuery < > >> > > > >> > > >> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Boo > >> leanQuery.html > >> > > >. > >> > > If we *do not* use a TermQuery, then we have to know the exact > >> > > type of Analyzer < > >> > > > >> > > >> https://lucene.apache.org/core/5_2_1/core/org/apache/lucene/analysis/ > >> Analyzer.html > >> > > > > >> > > to use with each field in the query. > >> > > Ex: 'contents' field might have been analyzed using a > >> > StandardAnalyzer > >> > > < > >> > > > >> > > >> https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/luce > >> ne/analysis/standard/StandardAnalyzer.html > >> > > > > >> > > and the 'path' field might not have been analyzed at all. > >> > > If desired, support for raw lucene queries with a reserved word > >> should be > >> > > easy to add. > >> > > Ex: select * from dfs.`search-index` where searchQuery =3D > >> > > "+contents:maxItemsPerBlock > >> > > +path:/home/file.txt"; > >> > > *Converting SqlFilter to Lucene Query:* Currently only "=3D" and "= !=3D" > >> > > operators are handled while converting a sql filter into a lucene > >> query. > >> > > For indexed fields this might be sufficient to handle a good > >> > > number of cases. For non-indexed fields operators like ">,<, like > >> > > etc" need to > >> be > >> > > handled. > >> > > *FileSystems:* Currently the format plugin only works on a local > >> > > filesystem. > >> > > > >> > > > >> > > Though far from complete, I want to work with the community to > >> > > get > >> some > >> > > feedback and avoid any chance of duplication of work. Kindly let > >> > > me > >> know > >> > > your thoughts > >> > > > >> > > - Rahul > >> > > > >> > > >> > > > > > > > > ***************************Legal Disclaimer*************************** > "This communication may contain confidential and privileged material for > the > sole use of the intended recipient. Any unauthorized review, use or > distribution > by others is strictly prohibited. If you have received the message by > mistake, > please advise the sender by reply email and delete the message. Thank you= ." > ********************************************************************** --001a11431d4a88afe2051de0ef8a--