couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Utley <chris.ut...@citationpoint.com>
Subject Re: Tika and CouchDB Lucene
Date Wed, 23 Jun 2010 02:04:07 GMT
Thank you for the response.  I'll provide some additional information
that might help to illuminate where I've gone wrong.

Just to be clear...  After 'mvn' successfully completes and I add the
3 lines to my CouchDB local.ini - then I...

1) cd target
2) tar xvf couchdb-lucene-0.6-SNAPSHOT-dist.tar.gz
3) cd couchdb-lucene-0.6-SNAPSHOT
4) cd bin
5) sudo ./run &

Then I test ... My fulltext queries run great except for indexing
attached PDFs.  Is there a way to tell if Tika is being run at all?  I
don't see any errors, so I'm not sure where to begin to look.

My query string looks like this:

http://server/database/_fti/_design/lucene/all?q=%22It+is+generally+recognized+that+ablation+of+VT+associated+with+structural+heart+disease%22


My attachment shows up like this:

_attachments
example.pdf
0.6 MB, application/pdf


My design doc looks like this:

{
   "_id": "_design/lucene",
   "_rev": "8-adbd1b56b459d9ec391ceb4cacc5f61f",
   "fulltext": {
       "all": {
           "defaults": {
               "store": "no"
           },
           "index": "function(doc) {var ret = new Document();function
idx(obj) {for (var key in obj) {switch (typeof obj[key]) {case
'object':idx(obj[key]);break;case
'function':break;default:ret.add(obj[key]);break;}}};idx(doc);if
(doc._attachments) {for (var i in doc._attachments)
{ret.attachment(\"attachment\", i);}}return ret;}"
       }
   }
}




On Tue, Jun 22, 2010 at 6:43 PM, Robert Newson <robert.newson@gmail.com> wrote:
>
> Tika is fully integrated into couchdb-lucene. You've likely omitted
> one or more steps in the README, but you should have built a zip file
> with 'mvn', unzipped it, and run couchdb-lucene from there. the
> startup scripts to put all of Tika on the classpath are included.
>
> B.
>
> On Tue, Jun 22, 2010 at 11:26 PM, Christopher Utley
> <chris.utley@citationpoint.com> wrote:
> > Greetings.  I was wondering if someone on the list might have experience
> > with CouchDB-Lucene, and more specifically Tika.
> >
> > My environment is as follows:
> >
> > Ubuntu 9.10
> > CouchDB 0.11.0
> > couchdb-lucene
> > Tika 0.7
> >
> > I have CouchDB-Lucene working fine.  Now I want to index (search) PDF
> > attachment contents.  Apparently the tool to do this (Tika) is not part of
> > the CouchDB Lucene package, so I had to build that separately.  Now I have
> > this jar file in the target directory where I built Tika.  I have no idea
> > how to tell CouchDB Lucene where Tika is installed, or how to get it to use
> > Tika now that's it's installed.
> >
> > Would setting the CLASSPATH in /etc/environment be part of the puzzle?
> >
> > Any ideas, suggestions, guesses, wild !#$ guesses, etc - would be greatly
> > appreciated.
> >
> > Regards,
> > Chris
> >

Mime
View raw message