lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject cvs commit: jakarta-lucene TODO.txt
Date Mon, 27 May 2002 23:56:54 GMT
otis        02/05/27 16:56:54

  Added:       .        TODO.txt
  - Lucene TO-DO items.
  Revision  Changes    Path
  1.1                  jakarta-lucene/TODO.txt
  Index: TODO.txt
  $Revision: 1.1 $
  - Term Vector support
  - Support for Search Term Highlighting
  - Better support for hits sorted by things other than score.
    An easy, efficient case is to support results sorted by the order documents were
    added to the index.  A little harder and less efficient is support for
    results sorted by an arbitrary field.
  - Add ability to "boost" individual documents/fields.
    When a document is indexed, a numeric "boost" value could be specified for the whole
    document, and/or for individual fields.  This value would be multipled into
    scores for hits on this document.  This would facilitate the implementation of
    things like Google's PageRank.
  - Add to FSDirectory the ability to specify where lock files live and
    to disable the use of lock files altogether (for read-only media).
  - Add some requested methods:
      String[] Document.getValues(String fieldName);
      String[] IndexReader.getIndexedFields();
      void Token.setPositionIncrement(int);
  - Péter Halácsy's changes to the QueryParser that make it possible to
    programmatically specify a default operator (OR or AND).
  - The recenly submitted code that allows for queries such as
    "Microsoft suc*" to match "Microsoft success" and "Microsoft sucks".
  - Make package protected abstract methods of
    public (I'd like to be able to make subclasses of Searcher, IndexWriter, InderReader).
  - Add lastModified() method to Directory, FSDirectory and RamDirectory, so
    it could be cached in IndexWriter/Searcher manager.
  - Support for adding more than 1 term to the same position.
    N.B. I think the Finnish lady already implemented this.  It required some
    pieces of Lucene to be modified. (OG).
  - The ability to retrieve the number of occurences not only for a term
    but also for a Phrase.
  - Alex Murzaku contributed some code for dealing with Russian.
  - A lady from Finland submitted code for handling Finnish.
  - Dutch stemmer, analyzer, etc.
  - French stemmer, analyzer, etc.
  - Che Dong's CJKTokenizer for Chinese, Japanese, and Korean.
  - Selecting a language-specific analyzer according to a locale.
    Now we rewrite parts of lucene codes in order to use another
    analyzer. It will be useful to select analyzer without touching codes.
  - Adding "-encoding" option and encoding-sensitive methods to tools.
    Current tools needs minor changes on a Japanese (and other language)
    environment: adding an "-encode" option and argument, useing
    Reader/Writer classes instead of InputStream/OutputStream classes, etc.
  $Revision: 1.1 $

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message