Mailing-List: contact general-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of stan.ieugen@gmail.com
 designates 209.85.212.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <1327364344301-3683433.post@n3.nabble.com>
References: <1327364344301-3683433.post@n3.nabble.com>
Date: Tue, 24 Jan 2012 22:34:46 +0200
Message-ID: 
 <CAFvdMiC76POBgaoywWnOncxG=WE2va9cRq9rcNJbnUbMzD+-8A@mail.gmail.com>
Subject: Re: finding "2012" in "1/01/2012 0:00"
From: Ioan Eugen Stan <stan.ieugen@gmail.com>
To: general@lucene.apache.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

2012/1/24 nuessler <shane.nuessler@canberra.edu.au>:
> Hi all,
> I'm new to Lucene and would like to know the query to find "2012" in
> "1/01/2012 0:00"?
>
> I'm a user of Equella, a learning object repository, and it uses Lucene a=
s
> it's search engine. I need to find objects from a particular year. The ye=
ar
> field is a text field (could have been a date field but long story).
>
> At the moment we build a query like this due to wildcard at start of quer=
y
> limitation.
>
> Query: WHERE Year LIKE "1*2012* OR 2*2012* OR 3*2012* ......... =C2=A0OR
> 31*2012*"
>
> I want to search for *2012* but cannot because this feature is disabled a=
s
> per default.
>
> How do other Lucene users find anything with e.g. "hat" in the text witho=
ut
> know the first character?
>
> Thanks for any help out there.
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/finding-=
2012-in-1-01-2012-0-00-tp3683433p3683433.html
> Sent from the Lucene - General mailing list archive at Nabble.com.

I can't give you an exact solution but you have to use a parser that
understand the date and index it properly.

For the second part, If you wish to find hat in words like "ahate",
then the only way to do this is to use word N-grams and also index the
reverse of the word also with N-grams:

For "ahate" you get:

full: ahate
1-gram: a, h, t, e
2-gram: ah,ha,at,te
3-gram: aha, hat,ate
4-gram: ahat,hate

And reverse:

full: etaha
1-grame: -the same as above-
2-grame: et, ta,ah,ha,
......

When a user searches with double wildcard you will search for the word
and also it's reveres.

Warning: This will increase your index size considerably !!!!!!! You
can avoid this if you are interested only in dates and enable n-grams
only for the date part, if you can figure it out at parsing time.

For more details about this please read the chapters in
http://nlp.stanford.edu/IR-book/.

Cheers,

--=20
Ioan Eugen Stan
http://ieugen.blogspot.com/