lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Yadav" <namitya...@gmail.com>
Subject Re: Index Rows as Documents? Help me design a solution
Date Tue, 25 Jul 2006 16:57:37 GMT
Thanks for the suggestion, Erick!

As for why we can't use a relational database, we get all the logs
from an external application. And due to the nature of the business,
we need to continue maintaining the logs. Moreover, the search
requests are very infrequent .. so it doesn't make sense to (almost)
replicate the complete data in database.

Back to the problem. Erick, here is a sample indexFile method (Is this
how I am supposed to index the file?):

    private static void indexFile(IndexWriter writer, File f) {
        try {
            System.out.println("Indexing " + f.getCanonicalPath());
            BufferedReader br = new BufferedReader(new FileReader(f));
            String line = null;
            String[] columns = null;
            while((line = br.readLine())!=null) {
                columns = line.split("#");
                if(columns.length == 4) { // Rows not having 4 columns
are not useful for us
                        Document doc = new Document();
                        doc.add(new Field("msisdn", columns[0],
Field.Store.YES, Field.Index.TOKENIZED));
                        doc.add(new Field("messageid", columns[2],
Field.Store.YES, Field.Index.TOKENIZED));
                        doc.add(new Field("line", line,
Field.Store.YES, Field.Index.NO));
                        writer.addDocument(doc);
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

On 7/25/06, Erick Erickson <erickerickson@gmail.com> wrote:
> Indexing 1M of logs shouldn't take minutes, so  you're probably right.
>
> A problem I've seen is opening/indexing/closing your index writer too often.
> You should do something like... (really bad pseudo code here)
>
> IndexWriter IW = new IndexWriter(....);
> for (lots and lots and lots of records) {
>    IW.addDocument();
> }
>
> IW.optimize();
> IW.close();
>
>
> Others have had a problem where they open/write/close the index writer for
> EACH document, which is painfully slow.
>
> Also, you might play around with IndexWriter.setMergeFactor and
> setMaxBufferedDocs. If you set them too high, you'll run out of memory, but
> they can make a difference in now fast your index is built....
>
>
> If none of this is relevant, can you post a bit of (perhaps pseudo) code?
>
> Best
> Erick
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message