Return-Path: X-Original-To: apmail-lucene-general-archive@www.apache.org Delivered-To: apmail-lucene-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2490B9B01 for ; Tue, 24 Jan 2012 20:35:15 +0000 (UTC) Received: (qmail 62109 invoked by uid 500); 24 Jan 2012 20:35:14 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 62035 invoked by uid 500); 24 Jan 2012 20:35:13 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 62023 invoked by uid 99); 24 Jan 2012 20:35:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jan 2012 20:35:13 +0000 X-ASF-Spam-Status: No, hits=0.6 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of stan.ieugen@gmail.com designates 209.85.212.48 as permitted sender) Received: from [209.85.212.48] (HELO mail-vw0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jan 2012 20:35:07 +0000 Received: by vbbfn1 with SMTP id fn1so4225292vbb.35 for ; Tue, 24 Jan 2012 12:34:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=gV8o/uxKqee0tS4GrDcdyaP2dmH+195RHHKZMkhpSDY=; b=UjdMb5mh/Iilr8s5Kz52VtaAON/1oWApp9E4FIrNqULwak+jp6UnTBVHW7w+9ACKgj Lxy9wu85gCEUzLGASc4rVLS+/TQMQDoVoDmyvhUi0APMxnb1eYhpK+MqFSkxNTcROD43 jKA6v6BnA+5fEXfsiloQwxeSB7f4s+9syXNvI= MIME-Version: 1.0 Received: by 10.52.174.51 with SMTP id bp19mr6580449vdc.102.1327437286919; Tue, 24 Jan 2012 12:34:46 -0800 (PST) Received: by 10.52.159.10 with HTTP; Tue, 24 Jan 2012 12:34:46 -0800 (PST) In-Reply-To: <1327364344301-3683433.post@n3.nabble.com> References: <1327364344301-3683433.post@n3.nabble.com> Date: Tue, 24 Jan 2012 22:34:46 +0200 Message-ID: Subject: Re: finding "2012" in "1/01/2012 0:00" From: Ioan Eugen Stan To: general@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org 2012/1/24 nuessler : > Hi all, > I'm new to Lucene and would like to know the query to find "2012" in > "1/01/2012 0:00"? > > I'm a user of Equella, a learning object repository, and it uses Lucene a= s > it's search engine. I need to find objects from a particular year. The ye= ar > field is a text field (could have been a date field but long story). > > At the moment we build a query like this due to wildcard at start of quer= y > limitation. > > Query: WHERE Year LIKE "1*2012* OR 2*2012* OR 3*2012* ......... =C2=A0OR > 31*2012*" > > I want to search for *2012* but cannot because this feature is disabled a= s > per default. > > How do other Lucene users find anything with e.g. "hat" in the text witho= ut > know the first character? > > Thanks for any help out there. > > -- > View this message in context: http://lucene.472066.n3.nabble.com/finding-= 2012-in-1-01-2012-0-00-tp3683433p3683433.html > Sent from the Lucene - General mailing list archive at Nabble.com. I can't give you an exact solution but you have to use a parser that understand the date and index it properly. For the second part, If you wish to find hat in words like "ahate", then the only way to do this is to use word N-grams and also index the reverse of the word also with N-grams: For "ahate" you get: full: ahate 1-gram: a, h, t, e 2-gram: ah,ha,at,te 3-gram: aha, hat,ate 4-gram: ahat,hate And reverse: full: etaha 1-grame: -the same as above- 2-grame: et, ta,ah,ha, ...... When a user searches with double wildcard you will search for the word and also it's reveres. Warning: This will increase your index size considerably !!!!!!! You can avoid this if you are interested only in dates and enable n-grams only for the date part, if you can figure it out at parsing time. For more details about this please read the chapters in http://nlp.stanford.edu/IR-book/. Cheers, --=20 Ioan Eugen Stan http://ieugen.blogspot.com/