commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Ritter <brit...@apache.org>
Subject Re: Creating EXIF tags (TiffOutputField) the right way
Date Wed, 01 Jun 2016 12:55:13 GMT
Hello Joakim,

glad you found out what to do. This would make for a good addition to the
user guide. Would you like to contribute your findings?

Benedikt

Joakim Knudsen <joakim.grahl@gmail.com> schrieb am Di., 31. Mai 2016 um
19:21 Uhr:

> Btw, ENCODING_UTF16 is just a String = "UTF-16LE" (Little Endian)
>
> On 31 May 2016 at 19:20, Joakim Knudsen <joakim.grahl@gmail.com> wrote:
>
> > Following a post on the User-Commons-Apache log (from 2012), I ended up
> > with the following code which seems to work.
> > It writes proper Unicode, which I can read back successfully using
> > ExifTool. I also see the comment nicely in Windows Explorer, and under
> File
> > > Properties.
> > Note I changed the field type from ASCII to FIELD_TYPE_UNDEFINED,
> > otherwise (with ASCII) it did not work. At least Windows couldn't make
> > sense of the EXIF data.
> >
> > // http://osdir.com/ml/user-commons-apache/2012-03/msg00046.html
> > byte[] unicodeMarker = new byte[]{ 0x55, 0x4E, 0x49, 0x43, 0x4F, 0x44,
> >         0x45, 0x00 };
> > byte[] comment = textToSet.getBytes(ENCODING_UTF16); // OR UTF-16BE if
> the file is big-endian!
> > byte[] bytesComment = new byte[unicodeMarker.length + comment.length];
> > System.arraycopy(unicodeMarker, 0, bytesComment, 0,
> unicodeMarker.length);
> > System.arraycopy(comment, 0, bytesComment, unicodeMarker.length,
> comment.length);
> >
> > TiffOutputField exif_comment = new
> TiffOutputField(TiffConstants.EXIF_TAG_USER_COMMENT,
> >         TiffFieldTypeConstants.FIELD_TYPE_UNDEFINED,
> bytesComment.length, bytesComment);
> >
> >
> > I can now write UserComment: "æøå" without problems :)
> >
> >
> >
> > - Joakim
> >
> >
> > On 31 May 2016 at 17:39, Benedikt Ritter <britter@apache.org> wrote:
> >
> >> Hello Joachim,
> >>
> >> Joakim Knudsen <joakim.grahl@gmail.com> schrieb am Sa., 28. Mai 2016 um
> >> 21:10 Uhr:
> >>
> >> > Hi Benedikt, and thanks for replying!
> >> >
> >> > So, if FieldType is unused, maybe the alternative, simpler constructor
> >> is
> >> > more appropriate/correct to use?
> >> >
> >> > // try using the approach given in the example (modified from the GPS
> >> tag):
> >> > TiffOutputField exif_comment = TiffOutputField.create(
> >> >         TiffConstants.EXIF_TAG_USER_COMMENT,
> >> >         outputSet.byteOrder, textToSet);
> >> >
> >> > However, now Sanselan throws an ImageWriteException:
> >> > org.apache.sanselan.ImageWriteException: Tag has unexpected data type.
> >> >
> >> > So are you 100% sure field type should not be set (to ASCII)?
> >> >
> >>
> >> No, I'm just saying that it uses a hard coded encoding anyway :-)
> >>
> >>
> >> >
> >> > Next, you're saying the string to set (textToSet) is converted
> >> internally
> >> > to byte array, using US-ASCII encoding.
> >> > If I try writing "æøåæøå" to a file, I get "쎦쎸쎥쎦쎸쎥" when
I copy the
> JPEG
> >> > out and check Properties in Windows Explorer.
> >> > If I write only ASCII characters, e.g. "Test", then that comes through
> >> just
> >> > fine.
> >> >
> >> > In summary, here is the code that works for me (except non-ASCII
> >> > characters):
> >> >
> >> >
> >> > *//
> >> >
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/commons-user/201203.mbox/%3CCAJm2B-mYCXYKuyu=Hs8UAZCpw-B=KWuZ4gsZFUOBvZWUN0LUfA@mail.gmail.com%3E
> >> > <
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/commons-user/201203.mbox/%3CCAJm2B-mYCXYKuyu=Hs8UAZCpw-B=KWuZ4gsZFUOBvZWUN0LUfA@mail.gmail.com%3E
> >> > >*byte
> >> > b[] = ExifTagConstants.EXIF_TAG_USER_COMMENT.encodeValue(
> >> >         TiffFieldTypeConstants.FIELD_TYPE_ASCII,
> >> >         textToSet, outputSet.
> >> > *byteOrder*);
> >> >
> >> > // constructor arguments: taginfo tag fieldtype count bytes
> >> > TiffOutputField exif_comment = new
> >> > TiffOutputField(TiffConstants.EXIF_TAG_USER_COMMENT.tag,
> >> >         TiffConstants.EXIF_TAG_USER_COMMENT,
> >> > TiffFieldTypeConstants.FIELD_TYPE_UNDEFINED,
> >> >         b.length, b);
> >> >
> >>
> >> The provided links indicate to me, that it is possible to write non
> ASCII
> >> characters. Are you sure your code looks like what Damjan suggested?
> >>
> >> Benedikt
> >>
> >>
> >> >
> >> >
> >> >
> >> > Joakim
> >> >
> >> >
> >> >
> >> > On 22 May 2016 at 15:29, Benedikt Ritter <britter@apache.org> wrote:
> >> >
> >> > > Hello Joakim
> >> > >
> >> > > Joakim Knudsen <joakim.grahl@gmail.com> schrieb am Sa., 21.
Mai
> 2016
> >> um
> >> > > 19:29 Uhr:
> >> > >
> >> > > > Hi List!
> >> > > >
> >> > > > I'm working on an Android app, where I want to read and write
> "EXIF
> >> > tags"
> >> > > > to JPEG files on the device. Sanselan 0.97 seems to work
> perfectly,
> >> > > > although it's a bit complicated to work with EXIF
> tags/directories.
> >> > > >
> >> > > > The specific tags I'm interested in, is EXIF_TAG_USER_COMMENT
and
> >> > > > EXIF_TAG_IMAGE_DESCRIPTION.
> >> > > > According to the documentation I could find, UserComment is of
> field
> >> > type
> >> > > > "undefined", whereas ImageDescription is of field type ASCII.
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> http://www.awaresystems.be/imaging/tiff/tifftags/privateifd/exif/usercomment.html
> >> > > >
> >> http://www.awaresystems.be/imaging/tiff/tifftags/imagedescription.html
> >> > > >
> >> > > > What's the proper way of creating those tags, wrt. charset etc?
I
> >> want
> >> > as
> >> > > > wide as possible character support (æøå etc).
> >> > > >
> >> > > > I find different discussions online, with different advice. Seems
> >> two
> >> > > > constructors are going around, where the simpler one does not
deal
> >> with
> >> > > > charset/encoding at all. This one uses the .create method:
> >> > > >
> >> > > > String textToSet = "Some Text æøå";
> >> > > >
> >> > > > TiffOutputField exif_comment = TiffOutputField.create(
> >> > > >                 TiffConstants.EXIF_TAG_USER_COMMENT,
> >> > > >                 outputSet.byteOrder, textToSet);
> >> > > >
> >> > > >
> >> > > > while this one uses the standard constructor:
> >> > > >
> >> > > > byte b[] = ExifTagConstants.EXIF_TAG_USER_COMMENT.encodeValue(
> >> > > >         TiffFieldTypeConstants.FIELD_TYPE_ASCII,
> >> > > >         textToSet, outputSet.byteOrder
> >> > > > );
> >> > > >
> >> > > > // constructor arguments: taginfo tag fieldtype count bytes
> >> > > > TiffOutputField exif_comment2 = new
> >> > > > TiffOutputField(TiffConstants.EXIF_TAG_USER_COMMENT.tag,
> >> > > >         TiffConstants.EXIF_TAG_USER_COMMENT,
> >> > > > TiffFieldTypeConstants.FIELD_TYPE_UNDEFINED,
> >> > > >         b.length, b);
> >> > > >
> >> > > > In this last one, the string to set has been converted to a byte
> >> array
> >> > > > first. But can/should I set the encoding anywhere?
> >> > > >
> >> > > > Is the field type even ASCII? This information seems to indicate
> >> it's
> >> > > > not ASCII...
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> http://www.awaresystems.be/imaging/tiff/tifftags/privateifd/exif/usercomment.html
> >> > > >
> >> > > >
> >> > > > Need some help here, as you can see, to get this right. The second
> >> > > > approach above does seem to work in my app, but I'd like to be
> sure
> >> > > > I'm not somehow messing up the JPEGs on the deviced.
> >> > > >
> >> > >
> >> > > I've looked at the code of
> >> > > org.apache.commons.imaging.formats.tiff.taginfos.TagInfoGpsText
> >> > > (ExifTagConstants.EXIF_TAG_USER_COMMENT is an instance of
> >> > TagInfoGpsText).
> >> > > Here are my observations:
> >> > >
> >> > > - The FieldType parameter, which you have set to
> >> > > TiffFieldTypeConstants.FIELD_TYPE_ASCII is never used in the
> >> > implemenation
> >> > > of encodeValue(FieldType, Object, ByteOrder)
> >> > > - When converting the input String to byte array,
> >> String.getBytes(String
> >> > > charsetName) is used
> >> > > - For charsetName "US-ASCII" is always used (it can not be
> configured
> >> by
> >> > > the user)
> >> > >
> >> > > So my guess is, that the code will not handle characters not in the
> >> > > US-ASCII charset correctly.
> >> > >
> >> > > Benedikt
> >> > >
> >> > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > Joakim
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message