lenya-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From J.-Wolfgang Kaltz <jwka...@yahoo.com>
Subject Search links after updating search index
Date Tue, 08 Jun 2004 18:04:41 GMT
Hi all,
I have been trying to use the search feature in Lenya (that is, the Lucene 
integration). First, congratulations to the developers, as it seems to be
basically working, though there remain a few (hopefully minor) issues.

I would like to know if anybody using the current version of Lenya in CVS
has managed to update the search index and have the search still working.

Here is what I am doing: I update the search index for a live publication 
(for instance, after publishing a new article) by using the
lenya/bin/crawl_and_index.xml file and by providing custom 
configuration files for crawling and indexing (see below).

After the update, I do a new search. But, the link URLs in the 
result list are wrong: the context is provided twice. With a little debugging 
of the XSL stylesheet, I see that the variable uri looks like
/lenya-2004-06-08/default/live/index.html
However, the other variables used to construct the link also contain this
context. For instance, contextprefix is /lenya-2004-06-08 and so on.
So the link URL contains the stuff twice, and the link does not work.

What is strange is that, if you simply check out the current version of Lenya
in CVS, you will find that the index is pre-created, and the website has been
pre-crawled. When the URLs in the result list are constructed, the variable uri 
does not contain the whole context, and thus the URLs are correct.

The only difference between an updated search index, and the one prefabricated 
in Lenya's CVS, is that when updating, the crawled files are placed in 
  htdocs_dump/live/lenya-2004-06-08/default/live/
which is the context of my publication being crawled,
whereas in the version in CVS, the pre-crawled file is directly in
  htdocs_dump/live/

So, my question is:
is anybody out there successfully updating the search index of a live 
publication (using the current version in CVS), and the URLs in the result 
list are still OK ?

Or, is there something I am missing in how the search engine should be 
configured for a publication ? I have tried many different options, but I am
not making progress:

here is my lucene.xconf
<lucene>
  <update-index type="new"/>
  <index-dir src="../../work/search/lucene/index/live/index"/>
  <htdocs-dump-dir src="../../work/search/lucene/htdocs_dump/live"/>
  <indexer class="org.apache.lenya.lucene.index.DefaultIndexer"/>
</lucene>

here is my crawler.xconf (the carriage returns in the href are just for display
 here, not in the actual file) :

<crawler>
  <user-agent>lenya</user-agent>

  <base-url 
    href="http://kronos.informatik.uni-duisburg.de:8080/lenya-2004-06-08/
       default/live/index.html"/>
  <scope-url 
    href="http://kronos.informatik.uni-duisburg.de:8080/lenya-2004-06-08/
       default/live/"/>

  <uri-list src="../../work/search/lucene/uris.txt"/>
  <htdocs-dump-dir src="../../work/search/lucene/htdocs_dump/live"/>

</crawler>

I would appreciate any help, because I am still unsure whether this is a 
configuration issue, or a bug.



---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org


Mime
View raw message