harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Deakin <oliver.dea...@googlemail.com>
Subject Re: [classlib][luni] guessing content mime types
Date Mon, 03 Sep 2007 09:49:00 GMT
Hi Tim,

There is FindMimeFromData() [1] defined in urlmon.h which may be useful 
- from [2] it appears that this is the system function used by IE to 
determine mime types.

Regards,
Oliver

[1] http://msdn2.microsoft.com/en-us/library/ms775107.aspx
[2] http://msdn2.microsoft.com/en-us/library/ms775147.aspx

Tim Ellison wrote:
> On a related note, we do a rubbish job of guessing the content type from
> the content of files themselves  via
> URLConnection#guessContentTypeFromStream(InputStream).  I've added a bit
> more logic in there for the most obvious cases, but when you consider
> the info in your typical Linux 'magic' file we have a long way to go.
> My first thought was whether we could ask the platform to guess for us,
> but I don't think there is any equivalent on Windows etc?
>
> Regards,
> Tim
>
> Alexey Petrenko wrote:
>   
>> Looks like both application/rtf and text/rtf are correct from IANA [1]
>> point of view.
>> So I do not see any harm to follow RI's behavior in this case.
>>
>> By the way application/rtf specification looks more fresh then text/rtf
>>
>> SY, Alexey
>>
>> 1. http://www.iana.org/assignments/media-types/
>>
>> 2007/8/31, Tim Ellison <t.p.ellison@gmail.com>:
>>     
>>> The MIME types for a given extension are defined here [1] which we took
>>> from httpd's view of the world.  So while it would be trivial to change
>>> them to be the same as the RI, I'm inclined to:
>>>  - leave rtf as text/rtf
>>>  - add java to our list as text/plain
>>>  - leave doc as application/msword
>>> then figure out how to snoop the stream for other types.
>>>
>>> [1]
>>> http://svn.apache.org/viewvc/harmony/enhanced/classlib/trunk/depends/files/content-types.properties?revision=494047&view=markup
>>>
>>> Thoughts?
>>> Tim
>>>
>>>
>>> Vasily Zakharov (JIRA) wrote:
>>>       
>>>> [classlib][luni] URLConnection.getContentType() works with files incorrectly
>>>> ----------------------------------------------------------------------------
>>>>
>>>>                  Key: HARMONY-4699
>>>>                  URL: https://issues.apache.org/jira/browse/HARMONY-4699
>>>>              Project: Harmony
>>>>           Issue Type: Bug
>>>>           Components: Classlib
>>>>             Reporter: Vasily Zakharov
>>>>
>>>>
>>>> In Harmony implementation, java.net.URLConnection.getContentType() works
incorrectly when addresses a file URL:
>>>>
>>>> 1. For files with .rtf extension, RI returns "application/rtf", while Harmony
returns "text/rtf".
>>>>
>>>> 2. For files with .java extension, RI returns "text/plain", while Harmony
returns "content/unknown".
>>>>
>>>> 3. For files with .doc extension, RI returns "content/unknown", while Harmony
returns "application/msword". The same is true for other known extensions.
>>>>
>>>> 4. For files with unrecognized extension and with HTML content, RI returns
"text/html", while Harmony returns "content/unknown".
>>>>
>>>> Items 1 and 2 look like a minor issues that would better be fixed for compatibility
with RI.
>>>>
>>>> Item 3 looks like a non-bug difference, as Harmony behaves clearly better
than RI in these cases.
>>>>
>>>> Item 4 looks like a serious bug, as RI clearly looks into file content for
the file type, and Harmony does not. Looks like org.apache.harmony.luni.internal.net.www.protocol.file.FileURLConnection.getContentType()
needs to be fixed to use guessContentTypeFromStream() in addition to guessContentTypeFromName().
>>>>
>>>> The attached archive contains the reproducer with some test files it uses.
Here's the reproducer code:
>>>>
>>>> public class Test {
>>>>     static void printContentType(String fileName) throws java.io.IOException
{
>>>>         System.out.println(fileName + ": " + new java.net.URL("file:" + fileName).openConnection().getContentType());
>>>>     }
>>>>     public static void main(String argv[]) {
>>>>         try {
>>>>             printContentType("test.rtf");
>>>>             printContentType("Test.java");
>>>>             printContentType("test.doc");
>>>>             printContentType("test.htx");
>>>>         } catch (Exception e) {
>>>>             e.printStackTrace(System.out);
>>>>         }
>>>>     }
>>>> }
>>>>
>>>> Output on RI:
>>>>
>>>> test.rtf: application/rtf
>>>> Test.java: text/plain
>>>> test.doc: content/unknown
>>>> test.htx: text/html
>>>>
>>>> Output on Harmony:
>>>>
>>>> test.rtf: text/rtf
>>>> Test.java: content/unknown
>>>> test.doc: application/msword
>>>> test.htx: content/unknown
>>>>
>>>> This issue is a blocker for HARMONY-4696, as on RI JEditorPane.getContentType()
should be based on URLConnection.getContentType() that now works incorrectly.
>>>>
>>>>
>>>>         
>
>   

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Mime
View raw message