incubator-jspwiki-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harry Metske <harry.met...@gmail.com>
Subject Re: searching
Date Mon, 24 Jan 2011 18:39:26 GMT
fixed in 3.0.0-svn-224 and 2.8.5-svn-5



2011/1/18 Rolf Schumacher <mailinglist@august.de>

> Yes, sounds great, Harry.
>
> The function getAttachmentContent(Attachment) is called whenever setupTask
> is executed.
>
> It would be another functionality to feed Lucene just after attachment gets
> ready, a good idea.
>
> What I meant is to make the text conversion dependent on the MIME type of
> the attachment instead of the filename extensions, however this is not
> really important in the first place.
>
> I would like to go after this immediately, however, due to overload in
> other areas, this will take a while. I will come back asap because
> accumulated knowledge is not only in wiki pages but in attachments as well.
>
> Rolf
>
>
> On 14.01.2011 20:30, Harry Metske wrote:
>
>> making a filter that processes "non plain text"  files like the ones you
>> mentioned sounds good.
>> If I understand it correctly it should be called when adding an
>> attachment,
>> it should process the file creating searchable text and hand them off to
>> lucene for indexing right ?
>> please also consider a unit test for it.
>>
>> adding a few more file-types for pure text files is a good quick-win,
>> starting with .mm .htm .xhtml .java .c .cpp .php .asm .sh .properties .kml
>> .gpx .loc
>>
>> anyone else opinions, suggestions ?
>>
>> regards,
>> Harry
>>
>> 2011/1/13 Rolf Schumacher<mailinglist@august.de>
>>
>>
>>
>>> ok, Harry, thank you for the link.
>>>
>>> My suggestions, please correct:
>>>
>>> - hard-coding of file types seems to me as not a problem: anything shall
>>> be
>>> searched
>>> - the list is too short, important types such as .doc, .odt, .pdf, .ppt,
>>> .odp are missing
>>> - am I right here?: If I can provide a filter that makes text out of this
>>> files it should not be as tough to add them
>>> - we may be better off if we have an attribute with each attachment
>>> telling
>>> its MIME type as far as detectable at attachment time, that way we are
>>> not
>>> as much dependent on correct file extentions
>>>
>>> - a quick suggestion: please add .mm as another xml type. The freemind
>>> plugin is of great value.
>>>
>>> kind regards
>>>
>>>
>>> Rolf
>>>
>>>
>>>
>>> On 11.01.2011 18:42, Harry Metske wrote:
>>>
>>>
>>>
>>>> Rolf,
>>>>
>>>> see the source
>>>>
>>>>
>>>> https://github.com/apache/jspwiki/blob/jspwiki_2_8_5/src/com/ecyrd/jspwiki/search/LuceneSearchProvider.java#L328
>>>>
>>>>
>>>> as you can see, currently the filetypes are hardcoded to just 4 types.
>>>> We could make this a configurable option, patches are welcome.
>>>>
>>>> You say "comments given to an Attachment", I assume you mean Change
>>>> Notes
>>>> entered while uploading an attachment (or saving an normal Wiki Page).
>>>> That is a bit more work I think.
>>>> Being a complete Lucene null, but looking at the code it looks like we
>>>> could
>>>> add another field (we already index the page author and page name) for
>>>> the
>>>> Change Note.
>>>>
>>>> regards,
>>>> Harry
>>>>
>>>>
>>>> 2011/1/10 Rolf Schumacher<mailinglist@august.de>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> I am using JSPWiki 2.8.4
>>>>>
>>>>> Is it possible to extend a search to attachments to some mime types,
>>>>> e.g.
>>>>> pdf?
>>>>>
>>>>> Is it possible to extend a search to the comments given to an
>>>>> attachment?
>>>>>
>>>>> kind regards
>>>>>
>>>>> Rolf
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message