lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Ochoa" <marcelo.oc...@gmail.com>
Subject Re: lucene functionality
Date Wed, 13 Dec 2006 19:06:05 GMT
Hi Mark:
  For 10 million records We recommend an strong database such as Oracle.
  You can annotate the Schema (.xsd) which describes your XML record
to store some field in traditional VARCHAR2 or NUMBER columns to query
it faster, and <DRECONTENT> in a CLOB column.
  You can find more information at:
http://www.oracle.com/technology/tech/xml/xmldb/index.html
http://www.oracle.com/technology/oramag/oracle/05-mar/o25xmlex.html
  If you annotate the schema, you don't need to parse the XML records
with Digester to store into the Oracle database, you can simple insert
using an XMLType object, using Ftp or WebDAV.
  Best regards, Marcelo.

On 12/13/06, Mark Mei <vmslucene@gmail.com> wrote:
> At the bottom of this email is the sample xml file that we are using today.
> We have about 10 million of these.
>
> We need to know whether Lucene can support the following functionalities.
> (1) Each field is searchable and indexable.
> (2) Fields such as STARTTIME and ENDTIME need to be treated as a pair so
> that we can apply timestamp operation such as search by data time ranges
> (3) Fields such as DMA need to be treated as numerical and be able to use
> math operators ( > < =) for those fields.
>
> We also use Apache Commons Digester to parse the xml files. So we want to
> know, can all of the above requirements be supported by combining both
> Digester and Lucene together, or do we need other modules in order for us to
> support those requirements?
> If these functionalities can be supported, please tell us about the effort
> involved (ie, do I need to rewrite 90% of Lucene/Digester to include support
> for these requirements, or is it more like spending one/two afternoons
> extending some classes ? )
>
> <DOCUMENT>
>   <DREREFERENCE>61926433</DREREFERENCE>
>   <DREDBNAME>News</DREDBNAME>
>   <SEGMENTID>61829557</SEGMENTID>
>   <SHOWID>2051460</SHOWID>
>   <PROGRAMID>21181</PROGRAMID>
>   <PROGRAMNAME>Action 10 News This Morning</PROGRAMNAME>
>   <PREFIX>wthi0600</PREFIX>
>   <STATIONID>903</STATIONID>
>   <STATIONNAME>WTHI-TV</STATIONNAME>
>   <AFFILIATEID>17</AFFILIATEID>
>   <AFFILIATENAME>CBS</AFFILIATENAME>
>   <MARKETID>141</MARKETID>
>   <MARKETNAME>Terre Haute</MARKETNAME>
>   <MEDIATYPE>T</MEDIATYPE>
>   <DMA>149</DMA>
>   <SOURCETYPE>CC</SOURCETYPE>
>   <STARTTIME>2005-07-04 06:00:00</STARTTIME>
>   <ENDTIME>2005-07-04 07:00:00</ENDTIME>
>   <STARTMETER>00:42:53</STARTMETER>
>   <ENDMETER>00:45:02</ENDMETER>
>   <DREDATE>2006-01-25 00:00:00</DREDATE>
>   <DRETITLE>At we take you to break with a look at some of the fourth of
> July fun going on around the wabash valley today.</DRETITLE>
>   <DRECONTENT>At we take you to break with a look at some of the fourth of
> July fun going on around the wabash valley today. This is action 10 news
> this morning on wthi. He's been the US Attorney general for only a few
> months. But alberto gonzales may already be in the running for a new job.
> And not just any job, either.</DRECONTENT>
> </DOCUMENT>
>
>


-- 
Marcelo F. Ochoa
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message