lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "none none" <>
Subject RE: VOTE: Possible features for next release
Date Thu, 23 May 2002 17:38:39 GMT

On Thu, 23 May 2002 08:29:17  
 Peter Carlson wrote:
>Hello all,
>Below is a list of all features that were requested/suggested for the next
>release of Lucene.
>If you are in favor of the feature AND you are willing to help implement /
>integrate and test it please put a +1 in the brackets. If you are against a
>feature please put a -1 in the brackets and provide a reason.
>Note: Non committers can vote here, but at least 1 committer must be active
>on the feature (i.e. willing to test and integrate it) for it to be part of
>the next release.
>If something is unclear please let me know. Also, if people have suggestions
>on a better way to organize this, let me know.
>[+1] Peter Halacsy's changes to the QueryParser that, I believe, make it
>possible to programmatically specify a default operator (OR or AND).
>[+1] The recently submitted code that allows for queries such as "Microsoft
>suc*" to match "Microsoft success" and "Microsoft sucks".
>[+1] Alex Murzaku contributed some code for dealing with Russian.
>[+1] A lady from Finland submitted code for handling Finnish.
>[+1] Japanese Analyzer ( Kazuhiro Kazama <>)
>[+1] make package protected abtract methods of
> to public (I'd like to be able to make
>subclasses of Searcher, IndexWriter, InderReader )
>[+1] Term Vector Support
>[ ] add lastModified() method to Directory, FSDirectory and RamDirectory (so
>it could be cached in IndexWriter/Searcher manager)
>[ ] support for adding more than 1 term to the same position (I'm sorry I
>didn't find Doug's email about this)
>[+1] Does anyone see a problem with adding support for storing unindexed,
>untokenized *binary* data as document fields?  At the moment, the closest
>thing we have is unindexed, untokenized *character* data.  Looking at the
>source, this will be a trivial change, but I'm curious to learn if there are
>specific reasons (other than inclination and opportunity) that this has been
>left out.
>[+1] Another feature could be the ability to retrieve the number of
>occurences not only for a term but also for a Phrase (see
>[+1] Better support for hits sorted by things other than score.  An easy,
>efficient case is to support results sorted by the order documents were
>added to the index.
>[+1] Support for results sorted by an arbitrary field.
>[+1] Add ability to "boost" individual documents/fields.  When a document is
>indexed, a numeric "boost" value could be specified for the whole document,
>and/or for individual fields.  This value would be multipled into scores for
>hits on this document.  This would facilitate the implementation of things
>like Google's pagerank.
>[ ] Add to FSDirectory the ability to specify where lock files live and to
>disable the use of lock files altogether (for read-only media).
>[+1] Add some requested methods:
>    String[] Document.getValues(String fieldName);
>    String[] IndexReader.getIndexedFields();
>    void Token.setPositionIncrement(int);

Also i want ot notify the follow:

1.More support for the HighLight system ("summarizer tool"), this needs some change to the
Query classes as suggested bye "Maik Schreiber": BooleanQuery.getClauses(), some other methods
public instead of private, etc, etc. In this way will be more confortable for users that want
add this features to their search engine, because right now every time there is a new release
we have to make those changes.
Also will be good have a method that gives back the positions of the terms found inside the
document (i think is somewhere in the Scorer but i don't know how use it), in that way: Analyzer
+ TermPositions => very easy produce an highlight. So the Document retrived from the Hits
should have a method to get the TermPositions.
Actually i am using the Jakarta ORO to search/match the terms inside the text and seems too
slow,specially with large files.

2.I see a lot of "problems" when Searching and Updating on the same index. May be is just
me, but what i discovered is:
 a)It is not possible "update" a document, it is possible just delete and re-add, that mean
open a Reader, do a delete, close the reader, open a writer, add the document, optimize ,
close the writer.
So it is possible move the "delete" method from the IndexReader to the IndexWriter? Or it
is impossible for tech. reasons? In this way we open just the Writer to do update,delete and
add documents. This is useful when the index needs to be updated often.
 b)There is no way to update just a field in a document,you need to update the entire document,
so a field update will be good,may be this is hard to do.

3.More documentation about the Index Format will be useful for users like me that don't know
how the index is built, segment,terms,positions, and their relationship.

4.Keep the index searcher opened inside the servlet or jsp save a lot of time, from my tests
on a 1GB index (600k docs) i see an average time like:
a)open for each request: 110 ms
b)open just once: 40 ms
Also i built a SearchEngineManager with RMI that send a callback to the servlet (registered
clients) after it refresh the index, so i re-open the searcher just when i really need.
I'll write i nice email to explain what my SearchEngineManager does in details, because it
does more than that.

May be i went "out of topic" but i think is the right moment to discuss such features.

>To unsubscribe, e-mail:   <>
>For additional commands, e-mail: <>

Outgrown your current e-mail service?
Get a 25MB Inbox, POP3 Access, No Ads and No Taglines with LYCOS MAIL PLUS.

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message