Character conversion does not fail when characters cannot be converted

All other questions regarding DCMTK

Moderator: Moderator Team

Post Reply
Message
Author
Fabian.Guenther
Posts: 3
Joined: Thu, 2024-03-07, 11:10

Character conversion does not fail when characters cannot be converted

#1 Post by Fabian.Guenther »

I have an issue when trying to convert between character sets with characters that have no representation in the target character set.

This happens in DCMTK 3.6.8 when using the new oviconv conversion and was working as expected in DCMTK 3.6.7 with libiconv.

Affects usage through the OFCharacterEncoding class (we are calling it through DcmSpecificCharacterSet using ISO_IR 192 and ISO_IR 101)

Expected:
Given a OFCharacterEncoding encoding object with a selected encoding from UTF-8 to ISO-8859-2.
When calling encoding.convertString(...) with UTF-8 characters that have no representation in ISO-8859-2
then the conversion should abort and return an error code
but the conversion succeeds and returns a string with seemingly random characters that do not represent the input string at all.

Calling OFCharacterEncoding::supportsConversionFlags(OFCharacterEncoding::ConversionFlags::AbortTranscodingOnIllegalSequence) returns true and explicitly setting encoding.setConversionFlags(OFCharacterEncoding::ConversionFlags::AbortTranscodingOnIllegalSequence) does not change the observed behaviour.

As an example input, the UTF-8 input of Αλέξης
in hex: ce 91 ce bb ce ad ce be ce b7 cf 82
returns no error when converting to ISO-8859-2 and gives a output of ˝´/4˝´/4˝´/4
in hex bd b4 2f 34 bd b4 2f 34 bd b4 2f 34

This i not limited to conversion to Latin2 but also occurs with Latin3 and Latin4. Weirdly enough when trying to convert invalid characters to Latin1 the conversion fails as expected. Trying to convert the same input as in the example to Latin1 correctly yields an empty outstring with a "Cannot convert character encoding" error code as expected.

Fabian.Guenther
Posts: 3
Joined: Thu, 2024-03-07, 11:10

Re: Character conversion does not fail when characters cannot be converted

#2 Post by Fabian.Guenther »

Here is a sample code snippet:

Code: Select all

OFCharacterEncoding encoding;
OFCondition res = encoding.selectEncoding(OFString("UTF-8"), OFString("ISO-8859-2"));
OFString fromString("\xce\x91\xce\xbb\xce\xad\xce\xbe\xce\xb7\xcf\x82");
OFString toString("");
res = encoding.convertString(fromString, toString);
EXPECT_TRUE(toString.empty());
This test passes running with DCMTK 3.6.8 when using libiconv as character conversion (by setting DCMTK_WITH_ICONV=ON and DCMTK_ENABLE_CHARSET_CONVERSION="libiconv") and fails when using oficonv (default settings, using DCMTK_ENABLE_BUILTIN_OFICONV_DATA=ON)
OS is Windows 10 x86_64 using MSVC compiler

Marco Eichelberg
OFFIS DICOM Team
OFFIS DICOM Team
Posts: 1445
Joined: Tue, 2004-11-02, 17:22
Location: Oldenburg, Germany
Contact:

Re: Character conversion does not fail when characters cannot be converted

#3 Post by Marco Eichelberg »

I can confirm this behaviour and have added an issue to our issue tracker: https://support.dcmtk.org/redmine/issues/1113

Apparently, this is caused by incorrect translation tables, in this case oficonv/datasrc/csmapper/ISO-8859/UCS%ISO-8859-2.src.
This is remarkable, because these tables come from the latest FreeBSD source, without any modification. Nevertheless, the table clearly contains nonsense.

Fabian.Guenther
Posts: 3
Joined: Thu, 2024-03-07, 11:10

Re: Character conversion does not fail when characters cannot be converted

#4 Post by Fabian.Guenther »

Thank you for creating a ticket for this. The faulty character set translation table is surprising coming from FreeBSD.

Do you (or Offis) have plans on reporting this to the FreeBSD Foundation or are you simply going to fix the tables on your end?

Marco Eichelberg
OFFIS DICOM Team
OFFIS DICOM Team
Posts: 1445
Joined: Tue, 2004-11-02, 17:22
Location: Oldenburg, Germany
Contact:

Re: Character conversion does not fail when characters cannot be converted

#5 Post by Marco Eichelberg »

I plan to report this to FreeBSD once I have analyzed the issue in more detail.

Marco Eichelberg
OFFIS DICOM Team
OFFIS DICOM Team
Posts: 1445
Joined: Tue, 2004-11-02, 17:22
Location: Oldenburg, Germany
Contact:

Re: Character conversion does not fail when characters cannot be converted

#6 Post by Marco Eichelberg »

I have today committed a fix for this issue to our testing branch. This should appear in the public git repository in a couple of days.
I have also reported the issue to FreeBSD, see https://bugs.freebsd.org/bugzilla/show_ ... ?id=278229.

Post Reply

Who is online

Users browsing this forum: Ahrefs [Bot], Bing [Bot], Semrush [Bot] and 1 guest