Data corruption in convertToUTF8() with oficonv

All other questions regarding DCMTK

Moderator: Moderator Team

Post Reply
Message
Author
cschreib
Posts: 1
Joined: Tue, 2024-11-19, 10:19
Location: UK
Contact:

Data corruption in convertToUTF8() with oficonv

#1 Post by cschreib »

Hello,

We have been using DCMTK 3.6.7 in our product, and recently attempted to upgrade to 3.6.8. When doing so, we noticed some of our tests failing. I managed to trace the problem back to the new "oficonv" charset conversion implementation, when calling

Code: Select all

convertToUTF8()
on a DICOM data set that is already encoded in UTF8, and which contains long text tags (> 1024 characters). I attach below the minimal code that reproduces the problem. The first 1024 characters are correct, but afterwards the string is corrupted (the first 1024 characters are repeated again and again; the other characters are lost).

I noticed the issue no longer occurs in the "master" branch, and narrowed down the fix to this particular commit: https://github.com/DCMTK/dcmtk/commit/8 ... 535c745b79
It appears the "passthrough" mode was causing the problem, and it is now disabled by default. But my understanding is that this is just hiding the problem: the passthrough implementation (or the code using it) still has a bug. This can be demonstrated by running the example below and defining "DCMTK_ENABLE_ICONV_PASSTHROUGH" when compiling DCMTK.

Code: Select all

#include <dcmtk/dcmdata/dctk.h>
#include <iostream>

int main() {
    std::string lorem_ipsum = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do "
                              "eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut "
                              "enim ad minim veniam, quis nostrud exercitation ullamco laboris "
                              "nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in "
                              "reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla "
                              "pariatur. Excepteur sint occaecat cupidatat non proident, sunt in "
                              "culpa qui officia deserunt mollit anim id est laborum.";

    std::string long_string = lorem_ipsum;
    while (long_string.size() < 4000) {
        long_string += " " + lorem_ipsum;
    }

    DcmDataset data;
    OFCondition result;

    result = data.chooseRepresentation(EXS_LittleEndianExplicit, nullptr);
    if (!result.good()) { return 2; }

    result = data.putAndInsertString(DCM_SpecificCharacterSet, "ISO_IR 192");
    if (!result.good()) { return 2; }

    result = data.putAndInsertString(DCM_DetectorDescription, long_string.data(), long_string.size(), true);
    if (!result.good()) { return 2; }

    result = data.convertToUTF8();
    if (!result.good()) { return 2; }

    const char* value = nullptr;
    result = data.findAndGetString(DCM_DetectorDescription, value);
    if (!result.good()) { return 2; }

    std::string new_string = value;

    std::cout << "Results match: " << (long_string == new_string ? "yes": "no") << std::endl;
    std::cout << "Original size:  " << long_string.size() << std::endl;
    std::cout << "Converted size: " << new_string.size() << std::endl;
    std::cout << "Original:  ..." << long_string.substr(1000, 48) << std::endl;
    std::cout << "Converted: ..." << new_string.substr(1000, 48) << std::endl;

    return long_string == new_string ? 1 : 0;
}
With passthrough, the output is:

Code: Select all

Results match: no
Original size:  4013
Converted size: 4013
Original:  ...e magna aliqua. Ut enim ad minim veniam, quis no
Converted: ...e magna aliqua. Ut enim Lorem ipsum dolor sit am
Without passthrough, the output is:

Code: Select all

Results match: yes
Original size:  4013
Converted size: 4013
Original:  ...e magna aliqua. Ut enim ad minim veniam, quis no
Converted: ...e magna aliqua. Ut enim ad minim veniam, quis no

Marco Eichelberg
OFFIS DICOM Team
OFFIS DICOM Team
Posts: 1512
Joined: Tue, 2004-11-02, 17:22
Location: Oldenburg, Germany
Contact:

Re: Data corruption in convertToUTF8() with oficonv

#2 Post by Marco Eichelberg »

Thank you for the bug report and test program. I have been able to reproduce the bug and have added it to our issue tracker at https://support.dcmtk.org/redmine/issues/1143.

Marco Eichelberg
OFFIS DICOM Team
OFFIS DICOM Team
Posts: 1512
Joined: Tue, 2004-11-02, 17:22
Location: Oldenburg, Germany
Contact:

Re: Data corruption in convertToUTF8() with oficonv

#3 Post by Marco Eichelberg »

For your information, I have fixed this issue today and committed the fix as commit #ae6352657. This will now run though our nightly builds and should become visible in the public git if no follow-up issues occur.
The fix turned out to be fairly simple:

Code: Select all

diff --git a/oficonv/libsrc/citrus_iconv_none.c b/oficonv/libsrc/citrus_iconv_none.c
index ee1050d4e..6974ce5d1 100644
--- a/oficonv/libsrc/citrus_iconv_none.c
+++ b/oficonv/libsrc/citrus_iconv_none.c
@@ -120,9 +120,9 @@ _citrus_iconv_none_iconv_convert(struct _citrus_iconv * ci ,
         len = *outbytes;
     }
     memcpy(*out, *in, len);
-    in += len;
+    *in += len;
     *inbytes -= len;
-    out += len;
+    *out += len;
     *outbytes -= len;
     *invalids = 0;
     if (e2big)

Post Reply

Who is online

Users browsing this forum: Ahrefs [Bot] and 1 guest