lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ioan Eugen Stan <stan.ieu...@gmail.com>
Subject Re: finding "2012" in "1/01/2012 0:00"
Date Tue, 24 Jan 2012 20:34:46 GMT
2012/1/24 nuessler <shane.nuessler@canberra.edu.au>:
> Hi all,
> I'm new to Lucene and would like to know the query to find "2012" in
> "1/01/2012 0:00"?
>
> I'm a user of Equella, a learning object repository, and it uses Lucene as
> it's search engine. I need to find objects from a particular year. The year
> field is a text field (could have been a date field but long story).
>
> At the moment we build a query like this due to wildcard at start of query
> limitation.
>
> Query: WHERE Year LIKE "1*2012* OR 2*2012* OR 3*2012* .........  OR
> 31*2012*"
>
> I want to search for *2012* but cannot because this feature is disabled as
> per default.
>
> How do other Lucene users find anything with e.g. "hat" in the text without
> know the first character?
>
> Thanks for any help out there.
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/finding-2012-in-1-01-2012-0-00-tp3683433p3683433.html
> Sent from the Lucene - General mailing list archive at Nabble.com.

I can't give you an exact solution but you have to use a parser that
understand the date and index it properly.

For the second part, If you wish to find hat in words like "ahate",
then the only way to do this is to use word N-grams and also index the
reverse of the word also with N-grams:

For "ahate" you get:

full: ahate
1-gram: a, h, t, e
2-gram: ah,ha,at,te
3-gram: aha, hat,ate
4-gram: ahat,hate

And reverse:

full: etaha
1-grame: -the same as above-
2-grame: et, ta,ah,ha,
......

When a user searches with double wildcard you will search for the word
and also it's reveres.

Warning: This will increase your index size considerably !!!!!!! You
can avoid this if you are interested only in dates and enable n-grams
only for the date part, if you can figure it out at parsing time.

For more details about this please read the chapters in
http://nlp.stanford.edu/IR-book/.

Cheers,

-- 
Ioan Eugen Stan
http://ieugen.blogspot.com/

Mime
View raw message