lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Performance problems on retrieving fields
Date Thu, 09 Sep 2010 09:48:52 GMT
What a neat search engine!  (Searching stack traces).

Unfortunately, loading stored fields is slowish -- it entails 2 disk
seeks under the hood.  Really you should retrieve at most a page worth
of docs, in the serial path of a query.  How many are you retrieving
per query?

That said, you shouldn't use LAZY_LOAD if you know you will need the
value.  Also, it's possible that sorting the docIDs (ascending) first
may get you better performance since your load is then a single scan
of the 2 files in the index.

You may want to use FieldCache.DEFAULT.getStrings instead -- this
gives you a very fast String[], but, may suck up tons of memory
depending on how many unique frames there are (how do you index each
frame?).

Mike

On Thu, Sep 9, 2010 at 4:01 AM, Johannes Lerch
<lerch.johannes@googlemail.com> wrote:
> Hi,
>
> i am working on a search for stacktraces. To do this i implemented my own
> Query, Weight and Scorer. I save exception, method and the frames as fields
> in the index and am able to pick relevant documents by matching those fields
> with my query stacktrace (using IndexReader.termDocs()). I implemented my
> own scoring which is calculated pairwise for stacktraces (the one of the
> query and each of the relevant documents). For this scoring i calculate a
> similarity between both traces by comparing the frames if they exist in both
> and also check for ordering. This works similar as diff on text/source code.
> My problem is, that i need all frames contained in both stacktraces, so i
> have to retrieve all frame fields of the stored stacktraces. For now i do
> this with:
> Document document = reader.document(doc, new FieldSelector() {
>            @Override
>            public FieldSelectorResult accept(String fieldName) {
>                if(Indexer.FIELD_FRAMES.equals(fieldName))
>                    return FieldSelectorResult.LAZY_LOAD;
>                else
>                    return FieldSelectorResult.NO_LOAD;
>            }
>        });
> Fieldable[] fieldables = document.getFieldables(Indexer.FIELD_FRAMES);
>
> But this call really decreases performance to something which is not
> agreeable for me (>10 times slower on 100000 stacktraces in index). So my
> question is, are there are other ways to get stored fields or do you have
> ideas for workarounds. Would it be better to store all stacktraces in a
> database and retrieve them from there? If so how do i get the docId of
> stacktraces i wrote to the index?
>
> Regards,
> Johannes
>

Mime
View raw message