lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Neumann" <neun...@gmail.com>
Subject Re: IBM OmniFind Yahoo! Edition
Date Thu, 14 Dec 2006 05:48:02 GMT
Thanks for the congratulations, Doug!

The credits for the Lucene side of the work really go to Michael, and
to the entire Lucene group - this community sometimes came up with
patches faster than we could ask for them.

To answer your question: How is Lucene used in this product?
- Needless to mention that we use Lucene to index and search documents.
- The documents are gathered by web and file system crawlers that we
  took from OmniFind Enterprise Edition, improved and adapted to the
  small-footprint of Yahoo! Edition.
- For analysis, we use IBM's LanguageWare text analytics packaged into
  the UIMA framework - no "vanilla" Lucene analyzers used. This part
  was a little tricky because UIMA's document processing model (analyze
  the entire document at once) differs from Lucene's, which analyzes
  each field separately.
- For search, we extended QueryParser for LanguageWare-specific handling
  of base forms, stopword. and synonyms. Oh, and we tuned the scoring a
  little.
- A lot of the work actually went into the infrastructure that puts it
  all together - configuration, administration, APIs etc.

All together, it was a thrill to work with Lucene, it made a lot of things
a whole lot easier.

- Andreas.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message