lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Aksoy <ahme...@axtelsoft.com>
Subject Indexing problems in a dictionary
Date Sat, 03 Sep 2005 08:12:45 GMT
Hi,
I'm using Lucene in an open source java project at 
http://belletmen.dev.java.net .
In the project there are several dictionaries with a simple structure. 
All items are composed of a "phrase", and a "definition". Both parts 
might contain a single word, or have lots of words.
Since both parts  might contain multiple  words,   I used the following:
    private Document buildDocument(SozlukBirimi birim){
        Document doc = new Document();
        doc.add(Field.Keyword("soz", birim.getSoz()));//soz means word 
in Turkish
        doc.add(Field.Text("soz1", birim.getSoz()));//the same as 
keyword part
        doc.add(Field.Text("anlam", birim.getAnlam()));//anlam means 
meaning in Turkish
        return doc;
    }
 As you can see, I used the first part both as a keyword field, and a 
text field. The reason is that the program will try to find phrases, or 
single words in the first part also.
At the first stages of the application, there were a single 
English-Turkish dictionary, and I had used an analyzer in which both 
English and Turkish stop words are included.
And, here my questions:
1- Do you think whether the above system is a good solution for a 
dictionary, or not?
2- I'm in hesitation now, about using stop words in a dictionary. What 
do you think?
3- I have a quite big timing problem. For a 107155 items of an 
English-English dictionary, it took 1436 seconds to complete the 
indexing on a 600MHz Pentium 4 Laptop with 256 MB of memory. Is it 
normal? Or, am I in a completely wrong way?
I'm waiting for your suggestions.
Thanks a lot.
Ahmet Aksoy


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message