lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Aksoy <>
Subject Indexing problems in a dictionary
Date Sat, 03 Sep 2005 08:12:45 GMT
I'm using Lucene in an open source java project at .
In the project there are several dictionaries with a simple structure. 
All items are composed of a "phrase", and a "definition". Both parts 
might contain a single word, or have lots of words.
Since both parts  might contain multiple  words,   I used the following:
    private Document buildDocument(SozlukBirimi birim){
        Document doc = new Document();
        doc.add(Field.Keyword("soz", birim.getSoz()));//soz means word 
in Turkish
        doc.add(Field.Text("soz1", birim.getSoz()));//the same as 
keyword part
        doc.add(Field.Text("anlam", birim.getAnlam()));//anlam means 
meaning in Turkish
        return doc;
 As you can see, I used the first part both as a keyword field, and a 
text field. The reason is that the program will try to find phrases, or 
single words in the first part also.
At the first stages of the application, there were a single 
English-Turkish dictionary, and I had used an analyzer in which both 
English and Turkish stop words are included.
And, here my questions:
1- Do you think whether the above system is a good solution for a 
dictionary, or not?
2- I'm in hesitation now, about using stop words in a dictionary. What 
do you think?
3- I have a quite big timing problem. For a 107155 items of an 
English-English dictionary, it took 1436 seconds to complete the 
indexing on a 600MHz Pentium 4 Laptop with 256 MB of memory. Is it 
normal? Or, am I in a completely wrong way?
I'm waiting for your suggestions.
Thanks a lot.
Ahmet Aksoy

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message