convertToUTF8() Cannot select source character set

All other questions regarding DCMTK

Moderator: Moderator Team

Post Reply
Message
Author
JHarker
Posts: 3
Joined: Wed, 2023-05-17, 07:44

convertToUTF8() Cannot select source character set

#1 Post by JHarker »

I'm seeing the error message Cannot select source character set: SpecificCharacterSet (0008,0005) value 'ISO 2022 IR 138' not supported when calling convertToUTF8() on some images being transmitted via network from a hospital PACS system.
I looked through the Unicode conversion topic here https://forum.dcmtk.org/viewtopic.php?t=4769 however, I'm not sure if any information there applies to ISO 2022 IR 138.

I have the DCMTK libraries compiled using iconv not icu and so far every other characterset has worked.

How can I get ISO 2022 IR 138 converted to UTF-8?

J. Riesmeier
DCMTK Developer
Posts: 2501
Joined: Tue, 2011-05-03, 14:38
Location: Oldenburg, Germany
Contact:

Re: convertToUTF8() Cannot select source character set

#2 Post by J. Riesmeier »

Does the Specific Character Set (0008,0005) attribute of the dataset contain only one value, namely "ISO 2022 IR 138"?

JHarker
Posts: 3
Joined: Wed, 2023-05-17, 07:44

Re: convertToUTF8() Cannot select source character set

#3 Post by JHarker »

It seems like it. Right now I only have access to the logging files, not the DICOM data, but that line in bold is exactly what it is being spit out.

If it is, unfortunately, I can't do anything about it as that is a hospital PACS system transmitting the data, nothing I can touch. I need to be able to handle the data as is from the PACS.

I did see in the other thread someone mentioned something about the DICOM standard requiring ISO 2022 to have more than one value, but I was unable to find such a requirement looking at the latest DICOM standard here https://dicom.nema.org/medical/dicom/cu ... tput/html/

I did see this in PS3.3 "If the Attribute Specific Character Set (0008,0005) has more than one value and value 1 is empty, it is assumed that value 1 is ISO 2022 IR 6"

J. Riesmeier
DCMTK Developer
Posts: 2501
Joined: Tue, 2011-05-03, 14:38
Location: Oldenburg, Germany
Contact:

Re: convertToUTF8() Cannot select source character set

#4 Post by J. Riesmeier »

DICOM PS3.3 Section C.12.1.1.2 says: "If the Attribute Specific Character Set (0008,0005) is not present or has only a single value, Code Extension techniques are not used." "Defined Terms for Single-Byte Character Sets without Code Extensions" can be found in Table C.12-2.

A single value with "ISO 2022 IR 138" is, therefore, invalid. You should inform the manufacturer of the device that the system creates invalid DICOM datasets.
If it is, unfortunately, I can't do anything about it as that is a hospital PACS system transmitting the data, nothing I can touch. I need to be able to handle the data as is from the PACS.
Then, you need to implement a workaround in your software that handles this invalid value for Specific Character Set (0008,0005). One of the questions is, of course, whether the creator of the dataset wanted to use "ISO_IR 138" or "\ISO 2022 IR 138". This is something you should find out first.
Last edited by J. Riesmeier on Mon, 2023-05-29, 13:19, edited 1 time in total.

Marco Eichelberg
OFFIS DICOM Team
OFFIS DICOM Team
Posts: 1437
Joined: Tue, 2004-11-02, 17:22
Location: Oldenburg, Germany
Contact:

Re: convertToUTF8() Cannot select source character set

#5 Post by Marco Eichelberg »

If you can get access to an anonymized sample file, we'd be interested in a copy to bugs/at/dcmtk/dot/org. I have never seen Hebrew used in an ISO 2022 multi-character set setting, so even if, as Jörg points out, the file violates the DICOM standard, I'd still be interested in checking whether we would be able to make that work with reasonable implementation effort.

J. Riesmeier
DCMTK Developer
Posts: 2501
Joined: Tue, 2011-05-03, 14:38
Location: Oldenburg, Germany
Contact:

Re: convertToUTF8() Cannot select source character set

#6 Post by J. Riesmeier »

In my opinion, workarounds for invalid values for the Specific Character Set (0008,0005) attribute should be implemented at application level and not in a general DICOM toolkit like DCMTK. This is because, how should one know which value was actually intended by the creator.

JHarker
Posts: 3
Joined: Wed, 2023-05-17, 07:44

Re: convertToUTF8() Cannot select source character set

#7 Post by JHarker »

Marco, If I can get a sample of the DICOM file, I will share it, or an anonymized version of it.

J. Riesmeier, Thanks for pointing out the DICOM Section. I missed that line!
In your mind, what would an application level workaround look like? Setting (0008, 0005) to an acceptable value like "\ISO 2022 IR 138" before calling convertToUTF8() and checking for a good status return?

J. Riesmeier
DCMTK Developer
Posts: 2501
Joined: Tue, 2011-05-03, 14:38
Location: Oldenburg, Germany
Contact:

Re: convertToUTF8() Cannot select source character set

#8 Post by J. Riesmeier »

You could first try with "\ISO 2022 IR 138" and see whether it works. It's also possible that the creator actually used "ISO_IR 138" (no code extension techniques, i.e. no ESC sequences).

Post Reply

Who is online

Users browsing this forum: Ahrefs [Bot], Google [Bot] and 1 guest