commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Diviacco <patrick.divia...@gmail.com>
Subject Re: [digester] digester performance..
Date Tue, 05 Apr 2011 14:24:46 GMT
HI, no it is not the same program.
I'm basically calling the method below in a for loop.

In my first app I invoked it only once over the entire index (30 rows), and
it took 2 minutes.

Now I'm calling it in loop for each row, because I need to update my index,
which is growing (first iteration 1 row, then 2... then 2 again, then 3...
and so on -it is a clustering algorithm and each row is a cluster).

It is supposed to be slow but I'm surpise it takes more than 1 hour.
Thanks


public static void performQuery(QueryDoc queryDoc) throws
java.io.IOException
{

BooleanQuery booleanQuery = new BooleanQuery(true);

notRelevant = new MatchAllDocsQuery();
booleanQuery.add(notRelevant, BooleanClause.Occur.SHOULD);

try {

phrase = queryDoc.getTitle();
for (int i = 0; i < phrase.length; i++) {
title = new BooleanQuery();
booleanQuery.add(new QueryParser(org.apache.lucene.util.Version.LUCENE_40,
"title", new
WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_40)).parse(phrase[i]),
BooleanClause.Occur.SHOULD);
}

phrase = queryDoc.getDescription();
for (int i = 0; i < phrase.length; i++) {
description = new BooleanQuery();
booleanQuery.add(new QueryParser(org.apache.lucene.util.Version.LUCENE_40,
"description", new
WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_40)).parse(phrase[i]),
BooleanClause.Occur.SHOULD);
}

//time = new TermQuery(new Term("time",queryDoc.getTime()));
//booleanQuery.add(time, BooleanClause.Occur.SHOULD);

phrase = queryDoc.getTags();
for (int i = 0; i < phrase.length; i++) {
tags = new BooleanQuery();
booleanQuery.add(new QueryParser(org.apache.lucene.util.Version.LUCENE_40,
"tags", new
WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_40)).parse(phrase[i]),
BooleanClause.Occur.SHOULD);
}

} catch (ParseException pe) {
//System.out.println(pe.getMessage());

}

topDocs = searcher.search(booleanQuery, 220000);
writeResults(topDocs, queryDoc);


}

On 5 April 2011 15:45, Simone Tripodi <simonetripodi@apache.org> wrote:

> Hi Patrick,
> if the Digester program you're speaking about is the one you pasted
> here time ago... well, there were a lot of optimization missed. For
> example I suggested you using the Lucene rules instead of storing all
> the properties in a POJO then creating the Lucene Document, in that
> way you limit the amount of stored data.
>
> When parsing large XML document - like your case - I suggest you
> mapping to Object as less as possible and stream more.
>
> HTH,
> Simo
>
> http://people.apache.org/~simonetripodi/
> http://www.99soft.org/
>
>
>
> 2011/4/5 Weiwei Wang <ww.wang.cs@gmail.com>:
> > I don't not think your program becomes slower because you are not using
> > Digester, RAM should be much faster. Suggest you make your main part of
> your
> > program simple and paste it in the email so as others can help
> >
> > On Tue, Apr 5, 2011 at 7:08 PM, Patrick Diviacco <
> patrick.diviacco@gmail.com
> >> wrote:
> >
> >> hi,
> >>
> >> I've a java app and I've stopped to use Digester recently because all my
> >> data is now kept in RAM and I don't need to write/parse xml files
> anymore.
> >>
> >> However, since I don't use Digester and external xml files, the
> performance
> >> of my app got worse.
> >>
> >> I now have the same data stored in a ArrayList<ArrayList<String>>
and
> I'm
> >> iterate them with a for cycle.
> >>
> >> Before they were in a xml file with the following structure:
> >>
> >> <collection>
> >> <doc>
> >> <field1></field1>
> >> ..
> >> </doc>
> >> ..
> >> </collection>
> >>
> >> Is really Digester much faster in iterating my data from xml file than a
> >> for
> >> loop iterating an ArrayList with the same content?
> >>
> >> thanks
> >>
> >
> >
> >
> > --
> > 王巍巍
> > Cell: 18911288489
> > MSN: ww.wang.cs@gmail.com
> > Blog: http://whisper.eyesay.org
> > 围脖:http://t.sina.com/lolorosa
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message