From email@example.com Thu Nov 07 20:36:23 2002
This document describes the list of tasks on
the plates of the Lucene development team. Tasks are assigned into two
categories: core or non-core.
Currently the Lucene development team is working on
categorizing change requests into core and non-core
Core changes would entail a change to the search engine
core itself. From Doug Cutting:
"Examples include: file locking to make things
multi-process safe; adding an API for boosting individual documents and fields
values; making the scoring API extensible and public; etc."
This document describes the list of tasks on the plates of the Lucene development team. Tasks are assigned into two categories: core or non-core.
Currently the Lucene development team is working on categorizing change requests into core and non-core changes.
Core changes would entail a change to the search engine core itself. From Doug Cutting:
Non-core changes would not affect the search engine itself, but would consist instead of projects or components that would make useful additions to the core framework. Again, from Doug Cutting:
"[Examples] include: support for more languages; query parsers; database storage; crawlers, etc. Whether these belong in the base distribution is a matter of debate (sometimes hot). My rule of thumb for including them is their generality: if they are likely to be useful to a large proportion of Lucene users then they should probably go in the base distribution. Language support in particular is tricky. Perhaps we should migrate to a model where the base distribution includes no analyzers, and supply separate language packages."
Change requests will be categorically defined by the development team (committers) as core or non-core, and a committer will be assigned responsibility for coordinating development of the change request. All change requests should be submitted to one of the Lucene mailing lists, or through the Apache Bugzilla database.
No change requests classified as core yet!
No change requests classified as non-core yet!
|Term Vector support|
|Support for Search Term Highlighting||
|Better support for hits sorted by things other than score.||An easy, efficient case is to support results sorted by the order documents were added to the index. A little harder and less efficient is support for results sorted by an arbitrary field.|
|Add some requested methods: Document.getValues, IndexReader.getIndexedFields||String Document.getValues(String fieldName); String IndexReader.getIndexedFields(); void Token.setPositionIncrement(int);|
|Add lastModified() method to Directory, FSDirectory and RamDirectory, so it could be cached in IndexWriter/Searcher manager.|
|Support for adding more than 1 term to the same position.||N.B. I think the Finnish lady already implemented this. It required some pieces of Lucene to be modified. (OG).|
|The ability to retrieve the number of occurrences not only for a term but also for a Phrase.|
|A lady from Finland submitted code for handling Finnish.|
|Dutch stemmer, analyzer, etc.|
|French stemmer, analyzer, etc.|
|Che Dong's CJKTokenizer for Chinese, Japanese, and Korean.|
|Selecting a language-specific analyzer according to a locale.||Now we rewrite parts of Lucene code in order to use another analyzer. It will be useful to select analyzer without touching code.|
|Adding "-encoding" option and encoding-sensitive methods to tools.||Current tools needs minor changes on a Japanese (and other language) environment: adding an "-encode" option and argument, using Reader/Writer classes instead of InputStream/OutputStream classes, etc.|