character encoding

Message

Shaeto · #1 Post by **Shaeto** » Fri, 2017-05-26, 19:04

not sure if i correct understand dicom 2016 but http://dicom.nema.org/medical/dicom/201 ... D.6.2.html

does this mean what 0008,0005 can contain "ISO 2022 IR 100" with VM 1 ?

in this case DcmSpecificCharacterSet class should allow "ISO 2022 IR 100" VM 1 but actually it does not. (DcmSpecificCharacterSet::selectCharacterSet)

J. Riesmeier · #2 Post by **J. Riesmeier** » Sun, 2017-05-28, 15:00

No, the DICOM standard is very clear regarding this: http://dicom.nema.org/medical/dicom/cur ... C.12.1.1.2

So, your assumptions is incorrect and also the reference to DICOM PS3.2 (Conformance) does not help; details on Specific Character Sets can be found in PS3.3 and PS3.5. By the way, "DICOM 2016d" is not the current edition of the DICOM standard; it's "DICOM 2017b"

Shaeto · #3 Post by **Shaeto** » Tue, 2017-05-30, 22:13

well... you confirmed the most terrible assumptions - modality software developers do not read standard

this year we started to receive studies with (0008, 0005) VM =1 and ISO 2022 IR 87 and other extensions, I was hoping that the standard could explain this.

Shaeto · #4 Post by **Shaeto** » Wed, 2017-05-31, 01:15

btw, is it possible to fix dicom->libicu character set mapping ?

here is a good documents http://demo.icu-project.org/icu-bin/convexp and https://en.wikipedia.org/wiki/ISO/IEC_2022

i suggest to use:

japanese:
ISO_IR 13 -> ISO-2022-JP (and ISO_IR 14 -> ISO-2022-JP ?)
ISO 2022 IR 87 -> ISO-2022-JP
ISO 2022 IR 159 -> ISO-2022-JP-2

korean:
ISO 2022 IR 149 -> ISO-2022-KR

chinese:
ISO 2022 IR 58 -> ISO-2022-CN ?

for ex atm dcmtk/libicu generates errors on ISO 2022 IR 87/ISO 2022 IR 159 dicoms conversion

Jan Schlamelcher · #5 Post by **Jan Schlamelcher** » Wed, 2017-05-31, 13:17

As far as I understand it ISO 2022 JP is also a set containing multiple character sets that can be switched via escape sequences. The ICU handles these escape sequences internally whereas the libiconv doesn't. This is why the existing code in DCMTK that was orignally written for libiconv parses these escape sequences itself, therefore, the ICU does not perceive them and cannot chose the correct character set. The only way to fix this would be to disable parsing the escape sequences when the ICU is used and then set all character sets similar to your proposition.

J. Riesmeier · #6 Post by **J. Riesmeier** » Wed, 2017-05-31, 20:51

Disabling the parsing of the escape sequences for ICU would not help since it would not support an arbitrary mixture of different character encodings (when using DICOM specific character sets with code extensions, i.e. ISO 2022). Also, there are some rules in DICOM that cause an implicit switching of the character encoding. If this is not supported by ICU, this library should not be used.

To the original poster: Why don't you use libiconv?

Jan Schlamelcher · #7 Post by **Jan Schlamelcher** » Thu, 2017-06-01, 14:39

J. Riesmeier wrote:Disabling the parsing of the escape sequences for ICU would not help since it would not support an arbitrary mixture of different character encodings (when using DICOM specific character sets with code extensions, i.e. ISO 2022).

Can you explain that a bit more? As I understand the ICU handles the parsing of those escape sequences, so it would work! Were do you see the problem?

J. Riesmeier wrote:Also, there are some rules in DICOM that cause an implicit switching of the character encoding. If this is not supported by ICU, this library should not be used.

Or one could simply feed a string with the respective explicit escape sequence to the ICU to switch it?

Shaeto · #8 Post by **Shaeto** » Thu, 2017-06-01, 14:42

moderm linux distros do not provide classical libiconv, it is available only as a part of glibc, you know glibc variant is not so powerful as a "standard" lib

so... i thought libicu is more convenient as it supported by my distro and dcmtk

btw as i understand libicu supports "dicom" esc code switches as it part of ISO 2022

Jan Schlamelcher · #9 Post by **Jan Schlamelcher** » Thu, 2017-06-01, 15:01

Shaeto wrote:as i understand libicu supports "dicom" esc code switches as it part of ISO 2022

Thats what I meant, so we "just" have to get the existing code to not remove them before passing the strings to the ICU...

J. Riesmeier · #10 Post by **J. Riesmeier** » Fri, 2017-06-02, 09:22

Can you explain that a bit more? As I understand the ICU handles the parsing of those escape sequences, so it would work! Were do you see the problem?

Does ICU support arbitrary ISO 2022 character sets without creating/opening the individual converters? E.g., is switching between ISO Latin-1 and any other character set that is defined by the DICOM standard supported, I mean by a single ICU converter?

Jan Schlamelcher · #11 Post by **Jan Schlamelcher** » Fri, 2017-06-02, 11:54

As far as I understand it the ICU does NOT allow selecting individual ISO 2022 character sets because one can only select ISO 2022 for which the ICU provides a state machine to switch between the different converters automatically/internally. Since Latin 1 is listed under ISO 2022 JP, I would expect the ICU to also handle switching between it and the others if ISO 2022 JP was selected, but we would most certainly have to test it.

J. Riesmeier · #12 Post by **J. Riesmeier** » Fri, 2017-06-02, 12:22

OK, if that really works, the ICU support should be incorporated differently (compared to libiconv). I would propose to separate DCMTK's ICU integration from the existing one (which was originally introduced for libiconv only).

Jan Schlamelcher · #13 Post by **Jan Schlamelcher** » Tue, 2017-06-06, 09:33

Well of course the ICU needs a different implementation in regard to the portion of the functionallity that differs. But the API (DcmSpecificCharacterSet etc.) should be unified, if not one would have to touch every portion of the code that uses it...

DICOM @ OFFIS

character encoding

character encoding

Re: character encoding

Re: character encoding

Re: character encoding

Re: character encoding

Re: character encoding

Re: character encoding

Re: character encoding

Re: character encoding

Re: character encoding

Re: character encoding

Re: character encoding

Re: character encoding

Who is online