lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aigner, Thomas" <TAig...@WescoDist.com>
Subject RE: Soliciting Design Thoughts on Date Searching
Date Wed, 28 Feb 2007 19:22:44 GMT
Walt,
	I am no expert, but it sounds like you need to associate many
dates to a single record.  Can this be handled as you would a synonym?
Basically add a token at the same offset as the row itself?  i.e. you
would have a record that would also have a date field that has 3 offsets
that would be treated as a synonym type (basically
setPositionIncrement(0)?)

	Just thinking outloud..

Tom


-----Original Message-----
From: Walt Stoneburner [mailto:walt.stoneburner@gmail.com] 
Sent: Wednesday, February 28, 2007 2:13 PM
To: java-user@lucene.apache.org
Subject: Re: Soliciting Design Thoughts on Date Searching

Been searching http://www.gossamer-threads.com/lists/lucene/java-user/
as Erick suggested; man, is there a wealth of information in the
Lucene archives.

I have found many examples of how to convert text to dates and back,
how to search Date fields for various ranges, and so forth -- but I
don't think this is what I'm looking for.

That material assumes I have a single date, such as last modified
date, and it's stored in a date field, and that I'm searching that
field.

What I'm looking to do is different.

I have generic material that _contain_ dates: historic time lines,
certificates, news articles, forms, deeds, testimonies, and wildly
free form genealogical information.  The dates have no specific
structure, obvious context, nor consistency.

Finding relevant material would be trivial if those dates were easily
cherry picked out and placed in a date field.  But they're not.  A
given document can have any number of embedded dates, provided for any
reason, and I'm interested in locating things which mention any date,
potentially within a range.

The issue isn't in using DateRange on a Date Field, but in knowing if
there is some filter that already exists which extracts dates from a
body of text to put into a Date Field.  If not, the DateTool solution
is a helpful step in building my own filter; I just don't want to
reinvent the wheel if it already exists.

Now this is where my personal knowledge of Lucene breaks down.
Assuming I can extract each date from a source's body and convert it
to a usable format, can a Lucene Date Field hold more than one date?
For example, is a strict name/value pair, or can the value be a array
of dates, or can I append additional dates under the same name?

Super generalizing, to break the discussion from a date specific
example, suppose I did this:
document.add( Field.Text( "title", "Learning Perl, Fourth Edition" )
); // real title
document.add( Field.Text( "title", "Camel Book" ) );  // my wife knows
it by the cover

Could I do a search for both the long and short title against the title
field?

If the answer is yes, problem solved!  I'll just pile on a ton of
dates as I find them and add them to the document.  (Note, I could
easily have hundreds.)

for ( Date somedate : allDatesFoundInSource[] ) {
  document.add( Field.Text( "embeddedDates", somedate ) );  // Right
way to do this?
}


If the answer is no, it better illustrates the problem I face:
searching across an arbitrary collection of dates.


Erick, if I've missed something obvious in the archives, I'll happily
accept my public flogging.    Thanks for your help so far.

-wls

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message