lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "Lincolm" by Lincolm
Date Mon, 08 Jun 2009 01:46:12 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by Lincolm:
http://wiki.apache.org/lucene-java/Lincolm

------------------------------------------------------------------------------
  Lucene HitCollector implementation with FieldCache to create document categorized searches
with minimal impact to performance.
+ 
+ When creating a lucene search application, one commom problem is create a categorized search
with groupings count based on that categories. Suppose you want a search grouping documents
found by a specific criteria. You have a list of articles, this articles has several fields
(title, body, theme, date, keywords, and on). Now, you want to create a search over this articles
list (materialized do a lucene index) and the search result must show a grouping count for
every article found on every theme for all document set.
+ 
+ Using some specialization to default HitCollector implementation can be easily done, and
will allow you to iterate over all document set. Iterate over all document set has a cost,
but, in this case it's necessary due search requisit: create grouping counts for every theme
found in a given search.
+ 
+ Access the disk, same using a FieldSelector it's not the best way because increase search
response time due much more IO activity to read documents on index. One efficient way I found
was iterate over all document set, but, instead of read all document from index (even using
a FieldSelector), I read the theme information from FieldCache implementation. So, the query
runs faster because no IO access is necessary!
+ 
+ Another problem I found was if an article has a multivalued field. The default FieldCache
implementation returns the first field value, only. So, I created a specialized FieldCache
in which a store the multivalued field as TermFreqVector.YES and I used IndexReader.getTermFreqVector(doc)
to read back the field array. During searcher creation  - new IndexSearcher(reader) - I populate
this special FieldCache and I used it directly from my HitCollector.
+ 
+ My application is running on production environment since 01/Jan/2009 at address: http://www.clicrbs.com.br/busca/rs
(portuguese site). So, a sample searching, grouping and creating paging can be found like
this: http://www.clicrbs.com.br/busca/rs?c=-1&debug=true&q=tipo%3Amaterias&t=2009
  ----
  CategoryHomepage
  

Mime
View raw message