lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <>
Subject Re: lucene functionality
Date Wed, 13 Dec 2006 19:09:27 GMT
Lucene RangeQuery would do for the "time" and "numeric" reqs.

"Mark Mei" <> wrote:
> At the bottom of this email is the sample xml file that we are using
> We have about 10 million of these.
> We need to know whether Lucene can support the following functionalities.
> (1) Each field is searchable and indexable.

This is natural in Lucene.

> (2) Fields such as STARTTIME and ENDTIME need to be treated as a pair so
> that we can apply timestamp operation such as search by data time ranges

You should be able to do it with two fields - start/end - and then you have
all the flexibility for queries. Conditions can be set on either one of
these or on both. In case that both are used, i.e. a doc matches only if
its start-to-end range falls within a certain minStart maxEnd range, you
would need two open ended range queries (or range filters) - one to apply
on the start date and one to apply on the end date, because a single range
query values must have the same field name. Also notice that open ended
range queries must be created programmatically - current QueryParser does
not support this. See using DateTools for saving the time values in a
resolution that matches your needs.

> (3) Fields such as DMA need to be treated as numerical and be able to use
> math operators ( > < =) for those fields.

Same comments on range queries / filters apply. Be aware though that
comparison is lexicographic, so numeric values should be indexed as
strings. See NumberTools.

> We also use Apache Commons Digester to parse the xml files. So we want to
> know, can all of the above requirements be supported by combining both
> Digester and Lucene together, or do we need other modules in order for us
> support those requirements?
> If these functionalities can be supported, please tell us about the
> involved (ie, do I need to rewrite 90% of Lucene/Digester to include
> for these requirements, or is it more like spending one/two afternoons
> extending some classes ? )

Apart from the XML handling (don't know about that), for the other reqs it
is just using Lucene's API.

>   <SHOWID>2051460</SHOWID>
>   <PROGRAMNAME>Action 10 News This Morning</PROGRAMNAME>
>   <PREFIX>wthi0600</PREFIX>
>   <DMA>149</DMA>
>   <STARTTIME>2005-07-04 06:00:00</STARTTIME>
>   <ENDTIME>2005-07-04 07:00:00</ENDTIME>
>   <ENDMETER>00:45:02</ENDMETER>
>   <DREDATE>2006-01-25 00:00:00</DREDATE>
>   <DRETITLE>At we take you to break with a look at some of the fourth of
> July fun going on around the wabash valley today.</DRETITLE>
>   <DRECONTENT>At we take you to break with a look at some of the fourth
> July fun going on around the wabash valley today. This is action 10 news
> this morning on wthi. He's been the US Attorney general for only a few
> months. But alberto gonzales may already be in the running for a new job.
> And not just any job, either.</DRECONTENT>

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message