commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joakim Knudsen <joakim.gr...@gmail.com>
Subject Re: Creating EXIF tags (TiffOutputField) the right way
Date Tue, 31 May 2016 17:21:02 GMT
Btw, ENCODING_UTF16 is just a String = "UTF-16LE" (Little Endian)

On 31 May 2016 at 19:20, Joakim Knudsen <joakim.grahl@gmail.com> wrote:

> Following a post on the User-Commons-Apache log (from 2012), I ended up
> with the following code which seems to work.
> It writes proper Unicode, which I can read back successfully using
> ExifTool. I also see the comment nicely in Windows Explorer, and under File
> > Properties.
> Note I changed the field type from ASCII to FIELD_TYPE_UNDEFINED,
> otherwise (with ASCII) it did not work. At least Windows couldn't make
> sense of the EXIF data.
>
> // http://osdir.com/ml/user-commons-apache/2012-03/msg00046.html
> byte[] unicodeMarker = new byte[]{ 0x55, 0x4E, 0x49, 0x43, 0x4F, 0x44,
>         0x45, 0x00 };
> byte[] comment = textToSet.getBytes(ENCODING_UTF16); // OR UTF-16BE if the file is big-endian!
> byte[] bytesComment = new byte[unicodeMarker.length + comment.length];
> System.arraycopy(unicodeMarker, 0, bytesComment, 0, unicodeMarker.length);
> System.arraycopy(comment, 0, bytesComment, unicodeMarker.length, comment.length);
>
> TiffOutputField exif_comment = new TiffOutputField(TiffConstants.EXIF_TAG_USER_COMMENT,
>         TiffFieldTypeConstants.FIELD_TYPE_UNDEFINED, bytesComment.length, bytesComment);
>
>
> I can now write UserComment: "æøå" without problems :)
>
>
>
> - Joakim
>
>
> On 31 May 2016 at 17:39, Benedikt Ritter <britter@apache.org> wrote:
>
>> Hello Joachim,
>>
>> Joakim Knudsen <joakim.grahl@gmail.com> schrieb am Sa., 28. Mai 2016 um
>> 21:10 Uhr:
>>
>> > Hi Benedikt, and thanks for replying!
>> >
>> > So, if FieldType is unused, maybe the alternative, simpler constructor
>> is
>> > more appropriate/correct to use?
>> >
>> > // try using the approach given in the example (modified from the GPS
>> tag):
>> > TiffOutputField exif_comment = TiffOutputField.create(
>> >         TiffConstants.EXIF_TAG_USER_COMMENT,
>> >         outputSet.byteOrder, textToSet);
>> >
>> > However, now Sanselan throws an ImageWriteException:
>> > org.apache.sanselan.ImageWriteException: Tag has unexpected data type.
>> >
>> > So are you 100% sure field type should not be set (to ASCII)?
>> >
>>
>> No, I'm just saying that it uses a hard coded encoding anyway :-)
>>
>>
>> >
>> > Next, you're saying the string to set (textToSet) is converted
>> internally
>> > to byte array, using US-ASCII encoding.
>> > If I try writing "æøåæøå" to a file, I get "쎦쎸쎥쎦쎸쎥" when I
copy the JPEG
>> > out and check Properties in Windows Explorer.
>> > If I write only ASCII characters, e.g. "Test", then that comes through
>> just
>> > fine.
>> >
>> > In summary, here is the code that works for me (except non-ASCII
>> > characters):
>> >
>> >
>> > *//
>> >
>> >
>> http://mail-archives.apache.org/mod_mbox/commons-user/201203.mbox/%3CCAJm2B-mYCXYKuyu=Hs8UAZCpw-B=KWuZ4gsZFUOBvZWUN0LUfA@mail.gmail.com%3E
>> > <
>> >
>> http://mail-archives.apache.org/mod_mbox/commons-user/201203.mbox/%3CCAJm2B-mYCXYKuyu=Hs8UAZCpw-B=KWuZ4gsZFUOBvZWUN0LUfA@mail.gmail.com%3E
>> > >*byte
>> > b[] = ExifTagConstants.EXIF_TAG_USER_COMMENT.encodeValue(
>> >         TiffFieldTypeConstants.FIELD_TYPE_ASCII,
>> >         textToSet, outputSet.
>> > *byteOrder*);
>> >
>> > // constructor arguments: taginfo tag fieldtype count bytes
>> > TiffOutputField exif_comment = new
>> > TiffOutputField(TiffConstants.EXIF_TAG_USER_COMMENT.tag,
>> >         TiffConstants.EXIF_TAG_USER_COMMENT,
>> > TiffFieldTypeConstants.FIELD_TYPE_UNDEFINED,
>> >         b.length, b);
>> >
>>
>> The provided links indicate to me, that it is possible to write non ASCII
>> characters. Are you sure your code looks like what Damjan suggested?
>>
>> Benedikt
>>
>>
>> >
>> >
>> >
>> > Joakim
>> >
>> >
>> >
>> > On 22 May 2016 at 15:29, Benedikt Ritter <britter@apache.org> wrote:
>> >
>> > > Hello Joakim
>> > >
>> > > Joakim Knudsen <joakim.grahl@gmail.com> schrieb am Sa., 21. Mai 2016
>> um
>> > > 19:29 Uhr:
>> > >
>> > > > Hi List!
>> > > >
>> > > > I'm working on an Android app, where I want to read and write "EXIF
>> > tags"
>> > > > to JPEG files on the device. Sanselan 0.97 seems to work perfectly,
>> > > > although it's a bit complicated to work with EXIF tags/directories.
>> > > >
>> > > > The specific tags I'm interested in, is EXIF_TAG_USER_COMMENT and
>> > > > EXIF_TAG_IMAGE_DESCRIPTION.
>> > > > According to the documentation I could find, UserComment is of field
>> > type
>> > > > "undefined", whereas ImageDescription is of field type ASCII.
>> > > >
>> > > >
>> > >
>> >
>> http://www.awaresystems.be/imaging/tiff/tifftags/privateifd/exif/usercomment.html
>> > > >
>> http://www.awaresystems.be/imaging/tiff/tifftags/imagedescription.html
>> > > >
>> > > > What's the proper way of creating those tags, wrt. charset etc? I
>> want
>> > as
>> > > > wide as possible character support (æøå etc).
>> > > >
>> > > > I find different discussions online, with different advice. Seems
>> two
>> > > > constructors are going around, where the simpler one does not deal
>> with
>> > > > charset/encoding at all. This one uses the .create method:
>> > > >
>> > > > String textToSet = "Some Text æøå";
>> > > >
>> > > > TiffOutputField exif_comment = TiffOutputField.create(
>> > > >                 TiffConstants.EXIF_TAG_USER_COMMENT,
>> > > >                 outputSet.byteOrder, textToSet);
>> > > >
>> > > >
>> > > > while this one uses the standard constructor:
>> > > >
>> > > > byte b[] = ExifTagConstants.EXIF_TAG_USER_COMMENT.encodeValue(
>> > > >         TiffFieldTypeConstants.FIELD_TYPE_ASCII,
>> > > >         textToSet, outputSet.byteOrder
>> > > > );
>> > > >
>> > > > // constructor arguments: taginfo tag fieldtype count bytes
>> > > > TiffOutputField exif_comment2 = new
>> > > > TiffOutputField(TiffConstants.EXIF_TAG_USER_COMMENT.tag,
>> > > >         TiffConstants.EXIF_TAG_USER_COMMENT,
>> > > > TiffFieldTypeConstants.FIELD_TYPE_UNDEFINED,
>> > > >         b.length, b);
>> > > >
>> > > > In this last one, the string to set has been converted to a byte
>> array
>> > > > first. But can/should I set the encoding anywhere?
>> > > >
>> > > > Is the field type even ASCII? This information seems to indicate
>> it's
>> > > > not ASCII...
>> > > >
>> > > >
>> > >
>> >
>> http://www.awaresystems.be/imaging/tiff/tifftags/privateifd/exif/usercomment.html
>> > > >
>> > > >
>> > > > Need some help here, as you can see, to get this right. The second
>> > > > approach above does seem to work in my app, but I'd like to be sure
>> > > > I'm not somehow messing up the JPEGs on the deviced.
>> > > >
>> > >
>> > > I've looked at the code of
>> > > org.apache.commons.imaging.formats.tiff.taginfos.TagInfoGpsText
>> > > (ExifTagConstants.EXIF_TAG_USER_COMMENT is an instance of
>> > TagInfoGpsText).
>> > > Here are my observations:
>> > >
>> > > - The FieldType parameter, which you have set to
>> > > TiffFieldTypeConstants.FIELD_TYPE_ASCII is never used in the
>> > implemenation
>> > > of encodeValue(FieldType, Object, ByteOrder)
>> > > - When converting the input String to byte array,
>> String.getBytes(String
>> > > charsetName) is used
>> > > - For charsetName "US-ASCII" is always used (it can not be configured
>> by
>> > > the user)
>> > >
>> > > So my guess is, that the code will not handle characters not in the
>> > > US-ASCII charset correctly.
>> > >
>> > > Benedikt
>> > >
>> > >
>> > > >
>> > > >
>> > > >
>> > > > Joakim
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message