jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phillip Rhodes <spamsu...@rhoderunner.com>
Subject Re: JackRabbit Search Engine Questions
Date Thu, 03 May 2007 01:41:04 GMT
I am a newbie, but see my answers below
----- Original Message -----
From: "Belinda Randolph" <belinda.a.randolph@jpl.nasa.gov>
To: users@jackrabbit.apache.org
Sent: Wednesday, May 2, 2007 4:42:23 PM (GMT-0500) America/New_York
Subject: JackRabbit Search Engine Questions

I am in the process of evaluating 10 repository solutions for my project.

I have several questions to ask in order to make my decisions.

1.  Can I replace the JackRabbit search engine with my own?

yes you can, but why would you?  I migrated to jackrabbit just so I could retire all my search
code (written in lucene).  Of course, you could write you own crawler/indexer to access the
jackrabbit repository.

2.  Does your search engine look through actual document contents - 
as a background process or at the time of the actual user search?

The document is indexed when it is added to the repository.  When the user searches, it is
executing the search against a previouly built index.  Very fast.

3.  What FORMATs of actual documents does your search engine look 
at?  (Ascii, Microsoft, PDF, etc.)

All those formats, and more.  You can easily create new ones if you like, you will have to
set the mime type on the content that you add to get your custom indexer to run against it.

4.  When searching the contents of a PDF file, does the background 
process, using OCR, create an additional file in another format? What format?

When the pdf is added to jackrabbit, text is extracted from the pdf and added to the search
index.  No OCR involved.  Just text extracted from the pdf.  If the PDF contains only images,
it will not do any ocr on those images.

5.  Does your OCR routine search FORMATS other than PDF? If yes, what 
formats can the OCR search?

There is no OCR technology involved, rather the text in the microsoft word document, etc.
 is extracted from the file using a library that understands the MS/PDF binary file format,
so no OCR is necessary.

6.  What are the resolution requirements for your OCR routines?

NO OCR involved with jackrabbit, but keep in mind, that we have the use of libraries that
understand MS Word/PDF/Etc formats that can extract the textual content of the files.

7.  Can I change the GUI to a) add functionality or error checking 
and b) to look personalized with CSS?

Jackrabbit does not have a gui, so you are in total control of it.  Some folks (like me) have
application components written that allow easy creating of gui's to read/write/access jackrabbit

8.  Can the search engine search using both requested metadata 
element values and keywords from the document contents?


9.  Can I start with keywords from the document contents and then 
later filter the results using user inputted metadata element values?


10.  Can I start with user input metadata element values and then 
later filter down the results with document contents?


11.  After an initial search, can I refine my search by only looking 
at the results of the previous search?

don't know.


View raw message