couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <robert.new...@gmail.com>
Subject Re: Tika and CouchDB Lucene
Date Wed, 23 Jun 2010 09:14:03 GMT
Try;

 http://server/database/_fti/_design/lucene/all?q=attachment:%22It+is+generally+recognized+that+ablation+of+VT+associated+with+structural+heart+disease%22

The attachments are indexed into a field called 'attachments'
according to your index function, so you need to select that field
when querying.

B.

On Wed, Jun 23, 2010 at 3:04 AM, Christopher Utley
<chris.utley@citationpoint.com> wrote:
> Thank you for the response.  I'll provide some additional information
> that might help to illuminate where I've gone wrong.
>
> Just to be clear...  After 'mvn' successfully completes and I add the
> 3 lines to my CouchDB local.ini - then I...
>
> 1) cd target
> 2) tar xvf couchdb-lucene-0.6-SNAPSHOT-dist.tar.gz
> 3) cd couchdb-lucene-0.6-SNAPSHOT
> 4) cd bin
> 5) sudo ./run &
>
> Then I test ... My fulltext queries run great except for indexing
> attached PDFs.  Is there a way to tell if Tika is being run at all?  I
> don't see any errors, so I'm not sure where to begin to look.
>
> My query string looks like this:
>
> http://server/database/_fti/_design/lucene/all?q=%22It+is+generally+recognized+that+ablation+of+VT+associated+with+structural+heart+disease%22
>
>
> My attachment shows up like this:
>
> _attachments
> example.pdf
> 0.6 MB, application/pdf
>
>
> My design doc looks like this:
>
> {
>   "_id": "_design/lucene",
>   "_rev": "8-adbd1b56b459d9ec391ceb4cacc5f61f",
>   "fulltext": {
>       "all": {
>           "defaults": {
>               "store": "no"
>           },
>           "index": "function(doc) {var ret = new Document();function
> idx(obj) {for (var key in obj) {switch (typeof obj[key]) {case
> 'object':idx(obj[key]);break;case
> 'function':break;default:ret.add(obj[key]);break;}}};idx(doc);if
> (doc._attachments) {for (var i in doc._attachments)
> {ret.attachment(\"attachment\", i);}}return ret;}"
>       }
>   }
> }
>
>
>
>
> On Tue, Jun 22, 2010 at 6:43 PM, Robert Newson <robert.newson@gmail.com> wrote:
>>
>> Tika is fully integrated into couchdb-lucene. You've likely omitted
>> one or more steps in the README, but you should have built a zip file
>> with 'mvn', unzipped it, and run couchdb-lucene from there. the
>> startup scripts to put all of Tika on the classpath are included.
>>
>> B.
>>
>> On Tue, Jun 22, 2010 at 11:26 PM, Christopher Utley
>> <chris.utley@citationpoint.com> wrote:
>> > Greetings.  I was wondering if someone on the list might have experience
>> > with CouchDB-Lucene, and more specifically Tika.
>> >
>> > My environment is as follows:
>> >
>> > Ubuntu 9.10
>> > CouchDB 0.11.0
>> > couchdb-lucene
>> > Tika 0.7
>> >
>> > I have CouchDB-Lucene working fine.  Now I want to index (search) PDF
>> > attachment contents.  Apparently the tool to do this (Tika) is not part of
>> > the CouchDB Lucene package, so I had to build that separately.  Now I have
>> > this jar file in the target directory where I built Tika.  I have no idea
>> > how to tell CouchDB Lucene where Tika is installed, or how to get it to use
>> > Tika now that's it's installed.
>> >
>> > Would setting the CLASSPATH in /etc/environment be part of the puzzle?
>> >
>> > Any ideas, suggestions, guesses, wild !#$ guesses, etc - would be greatly
>> > appreciated.
>> >
>> > Regards,
>> > Chris
>> >
>

Mime
View raw message