jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From daveg0 <bagel10002...@googlemail.com>
Subject Help with performace again
Date Fri, 03 Apr 2009 09:56:30 GMT

Hi,

I am getting some really poor search performance out of our repository and
need some help to try understand what I am doing wrong.

I wil try and give as much detail as possible:

1) We are trying to implement an atom repository (see cnd file)
http://www.nabble.com/file/p22865321/atom_node_types.cnd atom_node_types.cnd 

2) Search performance is VERY slow using this test below: It appears that
the problem is not the searches themselves, but the loading of the nods in
the returned NodeIterator, this is extemely slow. Does this not lazilly
load?

   @Test
    public final void testSearchPerformance() throws Exception {
        // Test jcr:like
        SpringJCRNodeDAO nodeQuery = (SpringJCRNodeDAO)
daoFactory.getDAO(Node.class);
        StopWatch watch = new StopWatch();
        String pathPrefix =
"//portal/portal/test-collection//element(*,atom:Entry)";
        String searchPrefix =
"//element(*,atom:Service)[@atom:title='portal']/element(*,atom:Workspace)[@atom:title='portal']/element(*,atom:Collection)[atom:titletext='test-collection']//element(*,atom:Entry)";

        String rootPrefix = "//element(*,atom:Entry)";

        Limits limits = new Limits();
        limits.setMaxRows(10000);
        String query = null;
        QueryResultWrapper<List<Node>> results = null;

        query = pathPrefix +
"[jcr:like(fn:lower-case(@atom:titletext),'title_1%')]";
        watch.start();
        results = nodeQuery.get(query, limits.getMaxRows(),
limits.getOffset());
        displayTime(query, watch);
        displayResults(query, results);
}

results:

09:43:06,257 [main] INFO  SearchPerformanceTest : Query:
//portal/portal/test-collection//element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')]
time: 4598
09:43:06,257 [main] INFO  SearchPerformanceTest : Results for:
//portal/portal/test-collection//element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')]
size: 1110
09:43:07,972 [main] INFO  SearchPerformanceTest : Query:
//portal/portal/test-collection//element(*,atom:Entry)[jcr:contains(@atom:titletext,'title_1*')]
time: 1715
09:43:07,972 [main] INFO  SearchPerformanceTest : Results for:
//portal/portal/test-collection//element(*,atom:Entry)[jcr:contains(@atom:titletext,'title_1*')]
size: 1110
09:43:09,639 [main] INFO  SearchPerformanceTest : Query:
//portal/portal/test-collection//element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')]
time: 1667
09:43:09,639 [main] INFO  SearchPerformanceTest : Results for:
//portal/portal/test-collection//element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')]
size: 1110
09:43:11,260 [main] INFO  SearchPerformanceTest : Query:
//element(*,atom:Entry)[jcr:contains(@atom:titletext,'title_1*')] time: 1605
09:43:11,260 [main] INFO  SearchPerformanceTest : Results for:
//element(*,atom:Entry)[jcr:contains(@atom:titletext,'title_1*')] size: 1110
09:43:12,881 [main] INFO  SearchPerformanceTest : Query:
//element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')]
time: 1621
09:43:12,881 [main] INFO  SearchPerformanceTest : Results for:
//element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')]
size: 1110
09:43:14,518 [main] INFO  SearchPerformanceTest : Query:
//element(*,atom:Service)[@atom:title='portal']/element(*,atom:Workspace)[@atom:title='portal']/element(*,atom:Collection)[atom:titletext='test-collection']//element(*,atom:Entry)[jcr:contains(@atom:titletext,'title_1*')]
time: 1637
09:43:14,518 [main] INFO  SearchPerformanceTest : Results for:
//element(*,atom:Service)[@atom:title='portal']/element(*,atom:Workspace)[@atom:title='portal']/element(*,atom:Collection)[atom:titletext='test-collection']//element(*,atom:Entry)[jcr:contains(@atom:titletext,'title_1*')]
size: 1110
09:43:16,186 [main] INFO  SearchPerformanceTest : Query:
//element(*,atom:Service)[@atom:title='portal']/element(*,atom:Workspace)[@atom:title='portal']/element(*,atom:Collection)[atom:titletext='test-collection']//element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')]
time: 1668
09:43:16,186 [main] INFO  SearchPerformanceTest : Results for:
//element(*,atom:Service)[@atom:title='portal']/element(*,atom:Workspace)[@atom:title='portal']/element(*,atom:Collection)[atom:titletext='test-collection']//element(*,atom:Entry)[jcr:like(fn:lower-case(@atom:titletext),'title_1%')]
size: 1110

The node query does something like:

public QueryResultWrapper<List<T>> get(String queryString, long limit, long
offset, String userId) throws DAOException {

    // check user privs code etc removed
    try {
	    QueryManager queryManager = session.getWorkspace().getQueryManager();

	    // This code is tied to JackRabbbit as it allows limits
	    // and offsets on the queries. The uncommented line is JCR
	    // implementation agnostic
	    Query query = queryManager.createQuery(queryString, Query.XPATH);
	    // QueryResult queryresult = query.execute();
	    // ========= JackRabbit-specific code
	    QueryImpl jackRabbitQuery = (QueryImpl) query;
	    jackRabbitQuery.setLimit(limit);
	    jackRabbitQuery.setOffset(offset);
	    jackrabbitQueryResult = (QueryResultImpl) jackRabbitQuery.execute();
	    // QueryResult queryresult = jackRabbitQuery.execute();
	    // ===== End of JakcRabbit-specific code

	    // NodeIterator nodes = queryresult.getNodes();
	    NodeIterator nodes = jackrabbitQueryResult.getNodes();
	    while (nodes.hasNext()) {
		returnList.add(nodes.nextNode());		
	    }
	} catch (Exception e) {
		LOG.error(e.getMessage(), e);
                throw new DAOException(e.getMessage());
	}
}

The equivalent Lucene searches through all of the index subdirectories in
workspaces/default/index say is:

   @Test
    public void testTitle() throws Exception {
        File[] indexes = new File(indexDir).listFiles(new FileFilter() {
            @Override
            public boolean accept(File f) {
                return f.isDirectory();
            }
        });

        StopWatch watch = new StopWatch();
        for (File file : indexes) {
            Directory directory = FSDirectory.getDirectory(file);
            IndexSearcher searcher = new IndexSearcher(directory);
            try {
                System.out.println("Directory: " + directory);
                Term t = new Term("12:FULL:titletext", "*");
                Query query = new WildcardQuery(t);
                watch.start();
                Hits hits = searcher.search(query);
                //showDocs(hits);
                watch.stop();
                System.out.println("Hits: " + hits.length() + " query: " +
query + " time: " + watch.getTime());
                watch.reset();
            } finally {
                searcher.close();
                directory.close();
            }
        }
    }

    private void showDocs(Hits hits) throws CorruptIndexException,
IOException {
        Document doc;
        for (int i = 0; i < hits.length(); i++) {
            doc = hits.doc(i);
            System.out.println("doc: " +
doc.getField("_:UUID").stringValue());
        }
    }

This returns very quickly:

Directory:
org.apache.lucene.store.FSDirectory@C:\repository\workspaces\default\index\_5
Hits: 601 query: 12:FULL:titletext:* time: 47
Directory:
org.apache.lucene.store.FSDirectory@C:\repository\workspaces\default\index\_b
Hits: 699 query: 12:FULL:titletext:* time: 16
Directory:
org.apache.lucene.store.FSDirectory@C:\repository\workspaces\default\index\_h
Hits: 811 query: 12:FULL:titletext:* time: 47
Directory:
org.apache.lucene.store.FSDirectory@C:\repository\workspaces\default\index\_y
Hits: 199 query: 12:FULL:titletext:* time: 15
Directory:
org.apache.lucene.store.FSDirectory@C:\repository\workspaces\default\index\_z
Hits: 1000 query: 12:FULL:titletext:* time: 31

I have inserted some debug statements into the code and it appears that the
NodeIterator is the problem, the first call to NodeIterator.hasNext() seems
to take seconds with only 1000 nodes, is there something that can be done
about this as the searches are quick but the loading of the nodes is VERY
slow,

regards,

Dave





-- 
View this message in context: http://www.nabble.com/Help-with-performace-again-tp22865321p22865321.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Mime
View raw message