character encoding

All other questions regarding DCMTK

Moderator: Moderator Team

Post Reply
Message
Author
Shaeto
Posts: 147
Joined: Tue, 2009-01-20, 17:50
Location: CA, USA
Contact:

character encoding

#1 Post by Shaeto »

not sure if i correct understand dicom 2016 but http://dicom.nema.org/medical/dicom/201 ... D.6.2.html

does this mean what 0008,0005 can contain "ISO 2022 IR 100" with VM 1 ?

in this case DcmSpecificCharacterSet class should allow "ISO 2022 IR 100" VM 1 but actually it does not. (DcmSpecificCharacterSet::selectCharacterSet)

J. Riesmeier
DCMTK Developer
Posts: 2501
Joined: Tue, 2011-05-03, 14:38
Location: Oldenburg, Germany
Contact:

Re: character encoding

#2 Post by J. Riesmeier »

No, the DICOM standard is very clear regarding this: http://dicom.nema.org/medical/dicom/cur ... C.12.1.1.2

So, your assumptions is incorrect and also the reference to DICOM PS3.2 (Conformance) does not help; details on Specific Character Sets can be found in PS3.3 and PS3.5. By the way, "DICOM 2016d" is not the current edition of the DICOM standard; it's "DICOM 2017b" :-)

Shaeto
Posts: 147
Joined: Tue, 2009-01-20, 17:50
Location: CA, USA
Contact:

Re: character encoding

#3 Post by Shaeto »

well... you confirmed the most terrible assumptions - modality software developers do not read standard :( this year we started to receive studies with (0008, 0005) VM =1 and ISO 2022 IR 87 and other extensions, I was hoping that the standard could explain this.

Shaeto
Posts: 147
Joined: Tue, 2009-01-20, 17:50
Location: CA, USA
Contact:

Re: character encoding

#4 Post by Shaeto »

btw, is it possible to fix dicom->libicu character set mapping ?

here is a good documents http://demo.icu-project.org/icu-bin/convexp and https://en.wikipedia.org/wiki/ISO/IEC_2022

i suggest to use:

japanese:
ISO_IR 13 -> ISO-2022-JP (and ISO_IR 14 -> ISO-2022-JP ?)
ISO 2022 IR 87 -> ISO-2022-JP
ISO 2022 IR 159 -> ISO-2022-JP-2

korean:
ISO 2022 IR 149 -> ISO-2022-KR

chinese:
ISO 2022 IR 58 -> ISO-2022-CN ?

for ex atm dcmtk/libicu generates errors on ISO 2022 IR 87/ISO 2022 IR 159 dicoms conversion

Jan Schlamelcher
OFFIS DICOM Team
OFFIS DICOM Team
Posts: 318
Joined: Mon, 2014-03-03, 09:51
Location: Oldenburg, Germany

Re: character encoding

#5 Post by Jan Schlamelcher »

As far as I understand it ISO 2022 JP is also a set containing multiple character sets that can be switched via escape sequences. The ICU handles these escape sequences internally whereas the libiconv doesn't. This is why the existing code in DCMTK that was orignally written for libiconv parses these escape sequences itself, therefore, the ICU does not perceive them and cannot chose the correct character set. The only way to fix this would be to disable parsing the escape sequences when the ICU is used and then set all character sets similar to your proposition.

J. Riesmeier
DCMTK Developer
Posts: 2501
Joined: Tue, 2011-05-03, 14:38
Location: Oldenburg, Germany
Contact:

Re: character encoding

#6 Post by J. Riesmeier »

Disabling the parsing of the escape sequences for ICU would not help since it would not support an arbitrary mixture of different character encodings (when using DICOM specific character sets with code extensions, i.e. ISO 2022). Also, there are some rules in DICOM that cause an implicit switching of the character encoding. If this is not supported by ICU, this library should not be used.

To the original poster: Why don't you use libiconv?

Jan Schlamelcher
OFFIS DICOM Team
OFFIS DICOM Team
Posts: 318
Joined: Mon, 2014-03-03, 09:51
Location: Oldenburg, Germany

Re: character encoding

#7 Post by Jan Schlamelcher »

J. Riesmeier wrote:Disabling the parsing of the escape sequences for ICU would not help since it would not support an arbitrary mixture of different character encodings (when using DICOM specific character sets with code extensions, i.e. ISO 2022).
Can you explain that a bit more? As I understand the ICU handles the parsing of those escape sequences, so it would work! Were do you see the problem?
J. Riesmeier wrote:Also, there are some rules in DICOM that cause an implicit switching of the character encoding. If this is not supported by ICU, this library should not be used.
Or one could simply feed a string with the respective explicit escape sequence to the ICU to switch it?

Shaeto
Posts: 147
Joined: Tue, 2009-01-20, 17:50
Location: CA, USA
Contact:

Re: character encoding

#8 Post by Shaeto »

moderm linux distros do not provide classical libiconv, it is available only as a part of glibc, you know glibc variant is not so powerful as a "standard" lib

so... i thought libicu is more convenient as it supported by my distro and dcmtk

btw as i understand libicu supports "dicom" esc code switches as it part of ISO 2022

Jan Schlamelcher
OFFIS DICOM Team
OFFIS DICOM Team
Posts: 318
Joined: Mon, 2014-03-03, 09:51
Location: Oldenburg, Germany

Re: character encoding

#9 Post by Jan Schlamelcher »

Shaeto wrote:as i understand libicu supports "dicom" esc code switches as it part of ISO 2022
Thats what I meant, so we "just" have to get the existing code to not remove them before passing the strings to the ICU...

J. Riesmeier
DCMTK Developer
Posts: 2501
Joined: Tue, 2011-05-03, 14:38
Location: Oldenburg, Germany
Contact:

Re: character encoding

#10 Post by J. Riesmeier »

Can you explain that a bit more? As I understand the ICU handles the parsing of those escape sequences, so it would work! Were do you see the problem?
Does ICU support arbitrary ISO 2022 character sets without creating/opening the individual converters? E.g., is switching between ISO Latin-1 and any other character set that is defined by the DICOM standard supported, I mean by a single ICU converter?

Jan Schlamelcher
OFFIS DICOM Team
OFFIS DICOM Team
Posts: 318
Joined: Mon, 2014-03-03, 09:51
Location: Oldenburg, Germany

Re: character encoding

#11 Post by Jan Schlamelcher »

As far as I understand it the ICU does NOT allow selecting individual ISO 2022 character sets because one can only select ISO 2022 for which the ICU provides a state machine to switch between the different converters automatically/internally. Since Latin 1 is listed under ISO 2022 JP, I would expect the ICU to also handle switching between it and the others if ISO 2022 JP was selected, but we would most certainly have to test it.

J. Riesmeier
DCMTK Developer
Posts: 2501
Joined: Tue, 2011-05-03, 14:38
Location: Oldenburg, Germany
Contact:

Re: character encoding

#12 Post by J. Riesmeier »

OK, if that really works, the ICU support should be incorporated differently (compared to libiconv). I would propose to separate DCMTK's ICU integration from the existing one (which was originally introduced for libiconv only).

Jan Schlamelcher
OFFIS DICOM Team
OFFIS DICOM Team
Posts: 318
Joined: Mon, 2014-03-03, 09:51
Location: Oldenburg, Germany

Re: character encoding

#13 Post by Jan Schlamelcher »

Well of course the ICU needs a different implementation in regard to the portion of the functionallity that differs. But the API (DcmSpecificCharacterSet etc.) should be unified, if not one would have to touch every portion of the code that uses it...

Post Reply

Who is online

Users browsing this forum: Ahrefs [Bot], Google [Bot] and 1 guest