incubator-jspwiki-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rolf Schumacher <mailingl...@august.de>
Subject Re: searching
Date Tue, 18 Jan 2011 20:58:14 GMT
Yes, sounds great, Harry.

The function getAttachmentContent(Attachment) is called whenever 
setupTask is executed.

It would be another functionality to feed Lucene just after attachment 
gets ready, a good idea.

What I meant is to make the text conversion dependent on the MIME type 
of the attachment instead of the filename extensions, however this is 
not really important in the first place.

I would like to go after this immediately, however, due to overload in 
other areas, this will take a while. I will come back asap because 
accumulated knowledge is not only in wiki pages but in attachments as well.

Rolf

On 14.01.2011 20:30, Harry Metske wrote:
> making a filter that processes "non plain text"  files like the ones you
> mentioned sounds good.
> If I understand it correctly it should be called when adding an attachment,
> it should process the file creating searchable text and hand them off to
> lucene for indexing right ?
> please also consider a unit test for it.
>
> adding a few more file-types for pure text files is a good quick-win,
> starting with .mm .htm .xhtml .java .c .cpp .php .asm .sh .properties .kml
> .gpx .loc
>
> anyone else opinions, suggestions ?
>
> regards,
> Harry
>
> 2011/1/13 Rolf Schumacher<mailinglist@august.de>
>
>    
>> ok, Harry, thank you for the link.
>>
>> My suggestions, please correct:
>>
>> - hard-coding of file types seems to me as not a problem: anything shall be
>> searched
>> - the list is too short, important types such as .doc, .odt, .pdf, .ppt,
>> .odp are missing
>> - am I right here?: If I can provide a filter that makes text out of this
>> files it should not be as tough to add them
>> - we may be better off if we have an attribute with each attachment telling
>> its MIME type as far as detectable at attachment time, that way we are not
>> as much dependent on correct file extentions
>>
>> - a quick suggestion: please add .mm as another xml type. The freemind
>> plugin is of great value.
>>
>> kind regards
>>
>>
>> Rolf
>>
>>
>>
>> On 11.01.2011 18:42, Harry Metske wrote:
>>
>>      
>>> Rolf,
>>>
>>> see the source
>>>
>>> https://github.com/apache/jspwiki/blob/jspwiki_2_8_5/src/com/ecyrd/jspwiki/search/LuceneSearchProvider.java#L328
>>>
>>>
>>> as you can see, currently the filetypes are hardcoded to just 4 types.
>>> We could make this a configurable option, patches are welcome.
>>>
>>> You say "comments given to an Attachment", I assume you mean Change Notes
>>> entered while uploading an attachment (or saving an normal Wiki Page).
>>> That is a bit more work I think.
>>> Being a complete Lucene null, but looking at the code it looks like we
>>> could
>>> add another field (we already index the page author and page name) for the
>>> Change Note.
>>>
>>> regards,
>>> Harry
>>>
>>>
>>> 2011/1/10 Rolf Schumacher<mailinglist@august.de>
>>>
>>>
>>>
>>>        
>>>> I am using JSPWiki 2.8.4
>>>>
>>>> Is it possible to extend a search to attachments to some mime types, e.g.
>>>> pdf?
>>>>
>>>> Is it possible to extend a search to the comments given to an attachment?
>>>>
>>>> kind regards
>>>>
>>>> Rolf
>>>>
>>>>
>>>>
>>>>          
>>>
>>>        
>>      
>    

Mime
View raw message