lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Outar" <rou...@ideorlando.org>
Subject RE: Indexing Growth
Date Wed, 02 Apr 2003 14:26:13 GMT
Just about everything calls getValue:

 public synchronized String getValue(String key, File file)
    throws ParseException,  IOException {

        Document doc = getDocument(file);
        return doc.get(key.toLowerCase());
    }

which calls get document:

 private synchronized Document getDocument(File file) throws
MalformedURLException,
    IOException {


        checkForIndexChange();
        Term t       = new Term(PATH,
file.toURI().toString().toLowerCase());
        TermQuery tQ = new TermQuery(t);

        Hits hits    = this.searcher.search(tQ);

        if (hits.length() == 1) {
            return hits.doc(0);
        }
        //this should never happen, cannot have a URL that returns 2 hits
        //that would mean the same file has been indexed twice
        else {
            return null;
        }


Thanks,

Rob


-----Original Message-----
From: Michael Barry [mailto:mbarry@cos.com]
Sent: Wednesday, April 02, 2003 9:20 AM
To: Lucene Users List
Subject: Re: Indexing Growth


Sounds like you either have an indexer that's run amok (maybe
a background process that's continually re-indexing your sandbox -
or expanding outside your sandbox) or your Query code is doing more
than querying. It's not behaviour I've seen. Without a snippet of
Query code, it's going to be hard to help.

Rob Outar wrote:

>Hi all,
>
>	This is too odd and I do not even know where to start.  We built a Windows
>Explorer type tool that indexes all files in a "sabdboxed" file system.
>Each Lucene document contains stuff like path, parent directory, last
>modified date, file_lock etc..  When we display the files in a given
>directory through the tool we query the index about 5 times for each file
in
>the repository, this is done so we can display all attributes in the index
>about that file.  So for example if there are 5 files in the directory,
each
>file has 6 attributes that means about 30 term queries are executed.  The
>initial index when build it about 10.4megs, after accessing about 3 or 4
>directories the index size increased to over 100megs, and we did not add
>anything!!  All we are doing is querying!!  Yesterday after querying became
>ungodly slow, we looked at the index size it had grown from 10megs to 1.5GB
>(granted we tested the tool all morning).  But I have no idea why the index
>is growing like this.  ANY help would be greatly appreciated.
>
>
>Thanks,
>
>Rob
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message