Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 18514E113 for ; Tue, 22 Jan 2013 20:40:30 +0000 (UTC) Received: (qmail 55427 invoked by uid 500); 22 Jan 2013 20:40:29 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 55401 invoked by uid 500); 22 Jan 2013 20:40:29 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 55393 invoked by uid 99); 22 Jan 2013 20:40:29 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Jan 2013 20:40:29 +0000 Received: from localhost (HELO mail-lb0-f173.google.com) (127.0.0.1) (smtp-auth username ctubbsii, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Jan 2013 20:40:29 +0000 Received: by mail-lb0-f173.google.com with SMTP id gf7so4675048lbb.4 for ; Tue, 22 Jan 2013 12:40:27 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.152.144.103 with SMTP id sl7mr22367259lab.23.1358887227197; Tue, 22 Jan 2013 12:40:27 -0800 (PST) Received: by 10.114.1.111 with HTTP; Tue, 22 Jan 2013 12:40:27 -0800 (PST) In-Reply-To: References: Date: Tue, 22 Jan 2013 15:40:27 -0500 Message-ID: Subject: Re: Doc-Partitioned Index with Wildcards From: Christopher To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=e89a8f22bec177881904d3e696f6 --e89a8f22bec177881904d3e696f6 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable You could store n-grams of terms, to support some limited wildcard searching. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, Jan 22, 2013 at 12:13 PM, Slater, David M. wrote: > I=92m trying to set up a document partitioned index that can handle a ran= ges > of terms or wildcards for queries.**** > > ** ** > > So, if instead of querying =93the=94 AND =93green=94 AND =93goblin=94, it= could handle > =93the=94 AND =93green=94 AND =93go*=94 (which would also return =93godde= ss=94, for > instance). Or a search that used =93the=94 AND =93d=94-=93f=94 AND =93gob= lin=94, handling > all values between =93d=94 and =93f=94.**** > > ** ** > > Using a typical document-partitioned index, I=92m guessing that you might > first resolve the wildcard into a list of terms, and then do a query in t= he > normal fashion. However, this seems rather inefficient. Is there a separa= te > data structure that would be recommended to handle this sort of additiona= l > functionality?**** > > ** ** > > Thanks, > David**** > --e89a8f22bec177881904d3e696f6 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
You could store n-grams of terms, to support some limited = wildcard searching.

=

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii=


On Tue, Jan 22, 2013 at 12:13 PM, Slater= , David M. <David.Slater@jhuapl.edu> wrote:

I=92m trying to set up a document partitioned index that can handle a r= anges of terms or wildcards for queries.

=A0

So, if instead of querying =93the=94 AND =93green=94= AND =93goblin=94, it could handle =93the=94 AND =93green=94 AND =93go*=94 = (which would also return =93goddess=94, for instance). Or a search that use= d =93the=94 AND =93d=94-=93f=94 AND =93goblin=94, handling all values betwe= en =93d=94 and =93f=94.

=A0

Using a = typical document-partitioned index, I=92m guessing that you might first res= olve the wildcard into a list of terms, and then do a query in the normal f= ashion. However, this seems rather inefficient. Is there a separate data st= ructure that would be recommended to handle this sort of additional functio= nality?

=A0

Thanks,<= br>David


--e89a8f22bec177881904d3e696f6--