lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan McKinley" <ryan...@gmail.com>
Subject Faceted Dates
Date Tue, 09 Jan 2007 05:08:17 GMT
I would like to use faceted browsing to group documents by year,
month, and day.  I can think of a few ways to do this, but I'd like to
see what folks think before i start down the wrong track.

Option 1:
Add three fields, one for year, month, day.  Something like:

 <field name="addedTime" type="date" indexed="true" stored="true" />
 <field name="addedTimeYEAR" type="string" ... />
 <field name="addedTimeMONTH" type="string" ... />
 <field name="addedTimeDAY" type="string" ... />

then use copyField to generate the various versions:
 <copyField source="addedTime" dest="addedTimeYEAR"/>
 <copyField source="addedTime" dest="addedTimeMONTH"/>
 <copyField source="addedTime" dest="addedTimeDAY"/>

this would somehow convert the original date format for each copy:
 addedTime      = "2007-01-08T21:36:15.635Z"
 addedTimeYEAR  = "2007"
 addedTimeMONTH = "2007-01"
 addedTimeDAY   = "2007-01-08"

Perhaps this requires a custom FieldType for Y/M/D to convert the
larger string to the smaller one.

pros:
* Can use SimpleFacets directly
cons:
* seems messy.  particularly since i have multiple fields i'd like to
have the same behavior.


Option 2:
Add an analyzer to the date field that adds multiple Tokens with
various resolutions, then write a custom faceter that knows a string
length 4=year, y=month, 10=day.  Or, perhaps it could look at the
token name.

schema.xml:

  <fieldtype name="fdate" class="solr.DateField">
    <analyzer type="index" class="...DateFacetAnalyzer"/>
  </fieldtype>

DateFacetAnalyzer:
 Token t = new Token( date, 0, date.length(), "original" );
 t.setPositionIncrement( 0 );
 tokens.add( t );

 t = new Token( date, 0, 4, "year" );
 t.setPositionIncrement( 0 );
 tokens.add( t );

 t = new Token( date, 0, 7, "month" );
 t.setPositionIncrement( 0 );
 tokens.add( t );

 ...

pros:
* simple / reusable
cons:
* I don't fully understand how it would affect search & sorting

Any thoughts / pointers / advice?

thanks
ryan

Mime
View raw message