lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Anderson <Eric.Ander...@LanRx.com>
Subject Re: Regarding Setup Lucine for my site
Date Wed, 05 Mar 2003 11:23:28 GMT
Samuel-

I'm basically using the software in a similar fashion to how you are. However, 
something to remember, is that the documents that you're indexing need to be in 
a location that is published by your webserver. What I did, was use the tomcat 
connectors, and mount my document repository inside my tomcat webapps 
directory. That way, it will index the path by using the demo IndexHTML command 
from a child of webapps. Then, I created JkMounts for the children of the 
webapps directory.

I'm not a developer (to say the least), and it's probably a somewhat half-baked 
way around the problems, but, my instance works, and all indexed documents are 
available via the links displayed on the results page.


Quoting Otis Gospodnetic <otis_gospodnetic@yahoo.com>:

> Samuel,
> 
> Some basic understanding of what Lucene is what is missing here.
> Lucene does not index web pages.
> Lucene indexes text.
> Lucene is not automatically aware of your wb site nor your domain.
> Lucene is aware only of what you 'feed it' at index time.
> If you index files, which IndexDemo does, Lucene index will have only
> information about files (information such as file path).  Lucene has no
> clue that you really want to index your web site.
> Even if you could replace C:\..... with http://.... it wouldn't be a
> good solution, as directory structures and file paths do not always map
> directly to URLs.
> 
> In short, you have a bit more reading to do :)
> The information is all there, it just has to be read :(
> Good luck!
> 
> Otis
> 
> 
> 
> --- Samuel Alfonso Velázquez Díaz <samuelvd@yahoo.com> wrote:
> > 
> > Yes I have
> > 1.- The directory with the files to index:
> > C:/filesToIndex/www/
> >  
> > 2.- A path where the index files from the search engine will be
> > created, lets say
> > C:/index/
> > 3.- I have an internet domain whose name is: www.mysite.com
> > 4.- A web application context that runs at
> > http://www.mysite.com/search
> >  
> > Once I have set all the above things I want to be able to use the
> > search aplication:
> > http://www.mysite.com/search/search.jsp
> > And I dont want that the results that I get from the index (step 2)
> > give me results like
> > Your file is at
> > C:/filesToIndex/www/some_html/my_doc.html
> > The results should be:
> > Your file is at
> > http://www.mysite.com/some_html/my_doc.html
> > For the comments I have read (THANK YOU VERY MUTCH) I conclude that
> > there is no way to generate the index with some custom prefix (as
> > http://www.mysite.com/ for the documents at C:/filesToIndex/www/).
> > It seems that I have to modify my web application
> > (http://www.mysite.com/search/search.jsp) to include some logic to
> > repalce "C:/filesToIndex/www/" to "http://www.mysite.com/".
> > If you could point me to the source code of lucene to include this
> > logic and this way fix it once and for all, will appreciate a lot.
> > The command I used to generate this index was:
> > java org.apache.lucene.demo.IndexHTML -create -index index C:\index
> > C:\filesToIndex\ www\
> > Now in the web application I have to modify 
> >       IndexSearcher searcher;
> >       Query query;  
> >       Hits hits;        
> > 
> >       // some code after...
> >      hits = searcher.search(query); 
> > 
> >       for ( /* search through the hit list*/)
> > 
> >           Document doc = hits.doc(i);        
> >           String doctitle = doc.get("title");
> >           String url = doc.get("url");       
> > 
> > I have to do some thing like url = "http://www.mysite.com/" +
> > url.substring("C:/filesToIndex/www/".length);
> > 
> > Regards!!!
> > And thanks again
> >  Pinky Iyer <pinkyiyer@yahoo.com> wrote:
> > I dont understand the explanantion. When I try and index the
> > documents as mentioned in the examples, and then when i run the app
> > and do a sample search, it does point to the directory structure say
> > "c:/filesToIndex/www/" instead of "http://localhost:8080/www/". So
> > how can this be changed to reflect the website domain as mentioned by
> > you. Could you explain again. Say my docs are under a directory
> > c:/filesToIndex/www/ and the wesite is as you said
> > http://localhost:8080/ , then how to proceed!
> > Thanks in advance!
> > Samuel Alfonso Velázquez Díaz wrote:
> > Oh ok, I thougth it was going to be some thing like the egothor
> > search engine (A java based search engine). When you create the
> > Index, you issue a command like:
> > java org.egothor.indexer.mirror.DoTanker /tmp/my_www
> > Project/Egothor/var/www as http://localhost:8080
> > /thmp/my_www: Is the path to the directory where the index is to be
> > created
> > Project/Egothor/var/www: is the path to the local file system files
> > to be indexed.
> > and as http://localhost:8080 is the prefix that the index will keep
> > on the hit list. This way the index will be relative to
> > http://localhost:8080. Even if your production site may be an other
> > site.
> > Thanks for your comments, any way now I know that I have to modify
> > code to do this.
> > Regards!
> > Jeff Linwood wrote:Hi,
> > 
> > I'm not a hundred percent sure I understand what you are asking, but
> > when
> > you get the results back from Lucene (the hits) it's up to you to
> > format
> > them to display on a web page - you can always do the modification
> > there
> > when you display the links to the results.
> > 
> > Jeff
> > ----- Original Message -----
> > From: "Samuel Alfonso Velázquez Díaz" 
> > To: "Lucene Users List" 
> > Sent: Tuesday, March 04, 2003 11:33 AM
> > Subject: Regarding Setup Lucine for my site
> > 
> > 
> > >
> > > The documentation says:
> > >
> > > Once you've gotten this far you're probably itching to go. Let's
> > start by
> > creating the index you'll need for the web examples. Since you've
> > already
> > set your classpath in the previous examples, all you need to do is
> > type
> > "java org.apache.lucene.demo.IndexHTML -create -index {index-dir}
> > ..".
> > You'll need to do this from a (any) subdirectory of your
> > {tomcat}/webapps
> > directory (make sure you didn't leave off the ".." or you'll get a
> > null
> > pointer exception). {index-dir} should be a directory that Tomcat has
> > permission to read and write, but is outside of a web accessible
> > context. By
> > default the webapp is configured to look in /opt/lucene/index for
> > this
> > index.
> > >
> > > A copy of my site is in:
> > >
> > > C:\CopiaSite20030228\
> > >
> > > My web application runs on
> > >
> > > http://mydomain.com/search/index.jsp
> > >
> > > how can I make the lucene index map the URLs of the indexed files
> > to:
> > >
> > > http://mydomain.com/
> > >
> > >
> > >
> > > Please help!
> > >
> > >
> > > Samuel Alfonso Velázquez Díaz
> > > http://www.geocities.com/samuelvd
> > > samuelvd@yahoo.com
> > >
> > >
> > > ---------------------------------
> > > Do you Yahoo!?
> > > Yahoo! Tax Center - forms, calculators, tips, and more
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > 
> > 
> > Samuel Alfonso Velázquez Díaz
> > http://www.geocities.com/samuelvd
> > samuelvd@yahoo.com
> > 
> > 
> > ---------------------------------
> > Do you Yahoo!?
> > Yahoo! Tax Center - forms, calculators, tips, and more
> > 
> > 
> > ---------------------------------
> > Do you Yahoo!?
> > Yahoo! Tax Center - forms, calculators, tips, and more
> > 
> > Samuel Alfonso Velázquez Díaz
> > http://www.geocities.com/samuelvd
> > samuelvd@yahoo.com
> > 
> > 
> > ---------------------------------
> > Do you Yahoo!?
> > Yahoo! Tax Center - forms, calculators, tips, and more
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, more
> http://taxes.yahoo.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

LanRx Network Solutions, Inc.
Providing Enterprise Level Solutions...On A Small Business Budget

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message