I have an issue when trying to convert between character sets with characters that have no representation in the target character set.
This happens in DCMTK 3.6.8 when using the new oviconv conversion and was working as expected in DCMTK 3.6.7 with libiconv.
Affects usage through the OFCharacterEncoding class (we are calling it through DcmSpecificCharacterSet using ISO_IR 192 and ISO_IR 101)
Expected:
Given a OFCharacterEncoding encoding object with a selected encoding from UTF-8 to ISO-8859-2.
When calling encoding.convertString(...) with UTF-8 characters that have no representation in ISO-8859-2
then the conversion should abort and return an error code
but the conversion succeeds and returns a string with seemingly random characters that do not represent the input string at all.
Calling OFCharacterEncoding::supportsConversionFlags(OFCharacterEncoding::ConversionFlags::AbortTranscodingOnIllegalSequence) returns true and explicitly setting encoding.setConversionFlags(OFCharacterEncoding::ConversionFlags::AbortTranscodingOnIllegalSequence) does not change the observed behaviour.
As an example input, the UTF-8 input of Αλέξης
in hex: ce 91 ce bb ce ad ce be ce b7 cf 82
returns no error when converting to ISO-8859-2 and gives a output of ˝´/4˝´/4˝´/4
in hex bd b4 2f 34 bd b4 2f 34 bd b4 2f 34
This i not limited to conversion to Latin2 but also occurs with Latin3 and Latin4. Weirdly enough when trying to convert invalid characters to Latin1 the conversion fails as expected. Trying to convert the same input as in the example to Latin1 correctly yields an empty outstring with a "Cannot convert character encoding" error code as expected.
OFCharacterEncoding encoding;
OFCondition res = encoding.selectEncoding(OFString("UTF-8"), OFString("ISO-8859-2"));
OFString fromString("\xce\x91\xce\xbb\xce\xad\xce\xbe\xce\xb7\xcf\x82");
OFString toString("");
res = encoding.convertString(fromString, toString);
EXPECT_TRUE(toString.empty());
This test passes running with DCMTK 3.6.8 when using libiconv as character conversion (by setting DCMTK_WITH_ICONV=ON and DCMTK_ENABLE_CHARSET_CONVERSION="libiconv") and fails when using oficonv (default settings, using DCMTK_ENABLE_BUILTIN_OFICONV_DATA=ON)
OS is Windows 10 x86_64 using MSVC compiler
Apparently, this is caused by incorrect translation tables, in this case oficonv/datasrc/csmapper/ISO-8859/UCS%ISO-8859-2.src.
This is remarkable, because these tables come from the latest FreeBSD source, without any modification. Nevertheless, the table clearly contains nonsense.
I have today committed a fix for this issue to our testing branch. This should appear in the public git repository in a couple of days.
I have also reported the issue to FreeBSD, see https://bugs.freebsd.org/bugzilla/show_ ... ?id=278229.