couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <robert.new...@gmail.com>
Subject Re: [HALF-SOLVED] Re: how to handle attachments with couchdb-lucene?
Date Sat, 05 Sep 2009 12:10:53 GMT
couchdb-lucene uses the content-type stored in couchdb when parsing
attachments. couchdb-lucene then uses Apache Tika to parse the
attachments, and it is there that support for new MIME types should be
requested.

A list of currently supported MIME types is available at;

http://github.com/rnewson/couchdb-lucene

B.

On Sat, Sep 5, 2009 at 11:51 AM, Thomas Harding<tom@thomas-harding.name> wrote:
>
> You got it!
> Tried to upload a pdf file, then it works...
>
> However, is someone have a way to handle ASCII or UTF-8 files which are
> guessed as "application/octet-stream" (sic!).
>
> More generally, how to force the handling by lucene for a peculiar
> mime-type?
> My first tries were for documents which "couchdb mime-type" was
> "text/x-patch",
> which you can obviously guess the usability :p
>
> Robert Newson wrote:
>>
>> Hi,
>>
>> The index function looks correct so I would suggest you check what
>> content type couchdb thinks your attachment is. If it's not in the
>> support list of content types, then it explains the lack of matches.
>>
>> B.
>>
>> On Sat, Sep 5, 2009 at 3:03 AM, Paul Joseph
>> Davis<paul.joseph.davis@gmail.com> wrote:
>>
>>>
>>> This is reaching a bit, but have you tried using 'attachment:diff' in the
>>> query? I seem to remember something about a minimum length for wildcard
>>> searching.
>>>
>>>
>>>
>>> On Sep 4, 2009, at 9:45 PM, Thomas Harding <tom@thomas-harding.name>
>>> wrote:
>>>
>>>
>>>>
>>>> Hello,
>>>> I'm trying to index, then retrieve attachments with couchdb-lucene.
>>>> I guess the problem comes from the query, but you can either find
>>>> the indexing code below.
>>>>
>>>> Trying a query to retrieve a "diff" attachment content which contains
>>>> "diff"
>>>>
>>>> #####################
>>>> the query (among other tries)
>>>> #####################
>>>> $ curl 'http://127.0.0.1:5984/ajatus_devel_db_content/\
>>>> _fti/lucene/by_attachments?q=attachment:d*'
>>>>
>>>> #####################
>>>> the response
>>>> #####################
>>>> {"q":"attachment:d*","etag":"12387ad7f7b",
>>>> "view_sig":"7ceed7519f0b61c517bd9ffee373414b",
>>>>
>>>>
>>>> "skip":0,"limit":25,"total_rows":0,"search_duration":0,"fetch_duration":0,"rows":[]}
>>>>
>>>> #################
>>>> the "_design/lucene" code:
>>>> #################
>>>> {
>>>> "_id": "_design/lucene",
>>>> "fulltext": {
>>>> ............
>>>> "by_attachments": {
>>>> "defaults": {
>>>> "store": "no"
>>>> },
>>>> "index": "function(doc) { var ret=new Document(); if (doc._attachments)
>>>> {
>>>> for (var i in doc._attachments) { ret.attachment('attachment', i); }};
>>>> return ret }"
>>>> },
>>>> },
>>>> }
>>>>
>>>>
>>>>
>
>

Mime
View raw message