jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: JackRabbit Search Engine Questions
Date Thu, 03 May 2007 12:50:47 GMT
Hi Belinda,

Belinda Randolph wrote:
> 1.  Can I replace the JackRabbit search engine with my own?

Yes you can, there are several interfaces you have to implement. See interface 
QueryHandler for a starting point:
http://svn.apache.org/repos/asf/jackrabbit/tags/1.3/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/QueryHandler.java

> 2.  Does your search engine look through actual document contents - as a 
> background process or at the time of the actual user search?

Whether text is extracted from documents and indexed when the document is saved 
or deferred to a a number of background threads is configurable.

> 3.  What FORMATs of actual documents does your search engine look at?  
> (Ascii, Microsoft, PDF, etc.)

The currently supported formats are:
- Microsoft Word, Excel, PowerPoint
- PDF
- Open Office Documents (text, spreadsheet, presentation, etc.)
- RTF
- HTML
- XML

Text extraction in Jackrabbit is extensible. See:
http://svn.apache.org/repos/asf/jackrabbit/tags/1.3/jackrabbit-text-extractors/src/main/java/org/apache/jackrabbit/extractor/TextExtractor.java

> 4.  When searching the contents of a PDF file, does the background 
> process, using OCR, create an additional file in another format? What 
> format?

The text extractor in Jackrabbit does not use OCR technology, but if you have an 
existing java solution you may easily integrate it into Jackrabbit.

> 5.  Does your OCR routine search FORMATS other than PDF? If yes, what 
> formats can the OCR search?

n/a

> 6.  What are the resolution requirements for your OCR routines?

n/a

> 7.  Can I change the GUI to a) add functionality or error checking and 
> b) to look personalized with CSS?

Jackrabbit is a content repository infrastructure and does not come with a user 
interface. You may use any existing JCR compliant application on top of Jackrabbit.

> 8.  Can the search engine search using both requested metadata element 
> values and keywords from the document contents?

Yes, this is possible.

> 9.  Can I start with keywords from the document contents and then later 
> filter the results using user inputted metadata element values?

Yes, you would simply execute a second query that includes metadata values.

> 10.  Can I start with user input metadata element values and then later 
> filter down the results with document contents?

Yes, you would execute the first query with just the metadata values and then a 
second one with additional keywords entered by the user.

> 11.  After an initial search, can I refine my search by only looking at 
> the results of the previous search?

Yes, you would simply execute the initial query again with additional search terms.

regards
  marcel

Mime
View raw message