lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johannes Lerch <lerch.johan...@googlemail.com>
Subject Re: Performance problems on retrieving fields
Date Thu, 09 Sep 2010 10:36:42 GMT
As my tests show about 1/4 documents are relevant for scoring per query. So
for my example with 100000 stacktraces in the index i need to score 25000
documents. I have a native implementation of the scoring algorithm which
scores all 100000. That needs about 20ms. The lucene implementation needs
for the same query >100ms what really sucks. Without retrieving fields it
needs about 6ms - thats also what my target should be.

I tried without LAZY_LOAD, but there is no real difference. How can i sort
by docIds first?

FieldCache.DEFAULT.getStrings ist not a possibility cause of to the memory
problem.
This is how i store frames:
for(StacktraceFrame frame : stacktrace.getFrames()) {
  doc.add(new Field(FIELD_FRAMES,
frame.getClassName()+"."+frame.getMethod(), Store.YES, Index.NOT_ANALYZED));
}



2010/9/9 Michael McCandless <lucene@mikemccandless.com>

> What a neat search engine!  (Searching stack traces).
>
> Unfortunately, loading stored fields is slowish -- it entails 2 disk
> seeks under the hood.  Really you should retrieve at most a page worth
> of docs, in the serial path of a query.  How many are you retrieving
> per query?
>
> That said, you shouldn't use LAZY_LOAD if you know you will need the
> value.  Also, it's possible that sorting the docIDs (ascending) first
> may get you better performance since your load is then a single scan
> of the 2 files in the index.
>
> You may want to use FieldCache.DEFAULT.getStrings instead -- this
> gives you a very fast String[], but, may suck up tons of memory
> depending on how many unique frames there are (how do you index each
> frame?).
>
> Mike
>
> On Thu, Sep 9, 2010 at 4:01 AM, Johannes Lerch
> <lerch.johannes@googlemail.com> wrote:
> > Hi,
> >
> > i am working on a search for stacktraces. To do this i implemented my own
> > Query, Weight and Scorer. I save exception, method and the frames as
> fields
> > in the index and am able to pick relevant documents by matching those
> fields
> > with my query stacktrace (using IndexReader.termDocs()). I implemented my
> > own scoring which is calculated pairwise for stacktraces (the one of the
> > query and each of the relevant documents). For this scoring i calculate a
> > similarity between both traces by comparing the frames if they exist in
> both
> > and also check for ordering. This works similar as diff on text/source
> code.
> > My problem is, that i need all frames contained in both stacktraces, so i
> > have to retrieve all frame fields of the stored stacktraces. For now i do
> > this with:
> > Document document = reader.document(doc, new FieldSelector() {
> >            @Override
> >            public FieldSelectorResult accept(String fieldName) {
> >                if(Indexer.FIELD_FRAMES.equals(fieldName))
> >                    return FieldSelectorResult.LAZY_LOAD;
> >                else
> >                    return FieldSelectorResult.NO_LOAD;
> >            }
> >        });
> > Fieldable[] fieldables = document.getFieldables(Indexer.FIELD_FRAMES);
> >
> > But this call really decreases performance to something which is not
> > agreeable for me (>10 times slower on 100000 stacktraces in index). So my
> > question is, are there are other ways to get stored fields or do you have
> > ideas for workarounds. Would it be better to store all stacktraces in a
> > database and retrieve them from there? If so how do i get the docId of
> > stacktraces i wrote to the index?
> >
> > Regards,
> > Johannes
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message