character encoding
Moderator: Moderator Team
character encoding
not sure if i correct understand dicom 2016 but http://dicom.nema.org/medical/dicom/201 ... D.6.2.html
does this mean what 0008,0005 can contain "ISO 2022 IR 100" with VM 1 ?
in this case DcmSpecificCharacterSet class should allow "ISO 2022 IR 100" VM 1 but actually it does not. (DcmSpecificCharacterSet::selectCharacterSet)
does this mean what 0008,0005 can contain "ISO 2022 IR 100" with VM 1 ?
in this case DcmSpecificCharacterSet class should allow "ISO 2022 IR 100" VM 1 but actually it does not. (DcmSpecificCharacterSet::selectCharacterSet)
-
- DCMTK Developer
- Posts: 2504
- Joined: Tue, 2011-05-03, 14:38
- Location: Oldenburg, Germany
- Contact:
Re: character encoding
No, the DICOM standard is very clear regarding this: http://dicom.nema.org/medical/dicom/cur ... C.12.1.1.2
So, your assumptions is incorrect and also the reference to DICOM PS3.2 (Conformance) does not help; details on Specific Character Sets can be found in PS3.3 and PS3.5. By the way, "DICOM 2016d" is not the current edition of the DICOM standard; it's "DICOM 2017b"
So, your assumptions is incorrect and also the reference to DICOM PS3.2 (Conformance) does not help; details on Specific Character Sets can be found in PS3.3 and PS3.5. By the way, "DICOM 2016d" is not the current edition of the DICOM standard; it's "DICOM 2017b"
Re: character encoding
well... you confirmed the most terrible assumptions - modality software developers do not read standard this year we started to receive studies with (0008, 0005) VM =1 and ISO 2022 IR 87 and other extensions, I was hoping that the standard could explain this.
Re: character encoding
btw, is it possible to fix dicom->libicu character set mapping ?
here is a good documents http://demo.icu-project.org/icu-bin/convexp and https://en.wikipedia.org/wiki/ISO/IEC_2022
i suggest to use:
japanese:
ISO_IR 13 -> ISO-2022-JP (and ISO_IR 14 -> ISO-2022-JP ?)
ISO 2022 IR 87 -> ISO-2022-JP
ISO 2022 IR 159 -> ISO-2022-JP-2
korean:
ISO 2022 IR 149 -> ISO-2022-KR
chinese:
ISO 2022 IR 58 -> ISO-2022-CN ?
for ex atm dcmtk/libicu generates errors on ISO 2022 IR 87/ISO 2022 IR 159 dicoms conversion
here is a good documents http://demo.icu-project.org/icu-bin/convexp and https://en.wikipedia.org/wiki/ISO/IEC_2022
i suggest to use:
japanese:
ISO_IR 13 -> ISO-2022-JP (and ISO_IR 14 -> ISO-2022-JP ?)
ISO 2022 IR 87 -> ISO-2022-JP
ISO 2022 IR 159 -> ISO-2022-JP-2
korean:
ISO 2022 IR 149 -> ISO-2022-KR
chinese:
ISO 2022 IR 58 -> ISO-2022-CN ?
for ex atm dcmtk/libicu generates errors on ISO 2022 IR 87/ISO 2022 IR 159 dicoms conversion
-
- OFFIS DICOM Team
- Posts: 318
- Joined: Mon, 2014-03-03, 09:51
- Location: Oldenburg, Germany
Re: character encoding
As far as I understand it ISO 2022 JP is also a set containing multiple character sets that can be switched via escape sequences. The ICU handles these escape sequences internally whereas the libiconv doesn't. This is why the existing code in DCMTK that was orignally written for libiconv parses these escape sequences itself, therefore, the ICU does not perceive them and cannot chose the correct character set. The only way to fix this would be to disable parsing the escape sequences when the ICU is used and then set all character sets similar to your proposition.
-
- DCMTK Developer
- Posts: 2504
- Joined: Tue, 2011-05-03, 14:38
- Location: Oldenburg, Germany
- Contact:
Re: character encoding
Disabling the parsing of the escape sequences for ICU would not help since it would not support an arbitrary mixture of different character encodings (when using DICOM specific character sets with code extensions, i.e. ISO 2022). Also, there are some rules in DICOM that cause an implicit switching of the character encoding. If this is not supported by ICU, this library should not be used.
To the original poster: Why don't you use libiconv?
To the original poster: Why don't you use libiconv?
-
- OFFIS DICOM Team
- Posts: 318
- Joined: Mon, 2014-03-03, 09:51
- Location: Oldenburg, Germany
Re: character encoding
Can you explain that a bit more? As I understand the ICU handles the parsing of those escape sequences, so it would work! Were do you see the problem?J. Riesmeier wrote:Disabling the parsing of the escape sequences for ICU would not help since it would not support an arbitrary mixture of different character encodings (when using DICOM specific character sets with code extensions, i.e. ISO 2022).
Or one could simply feed a string with the respective explicit escape sequence to the ICU to switch it?J. Riesmeier wrote:Also, there are some rules in DICOM that cause an implicit switching of the character encoding. If this is not supported by ICU, this library should not be used.
Re: character encoding
moderm linux distros do not provide classical libiconv, it is available only as a part of glibc, you know glibc variant is not so powerful as a "standard" lib
so... i thought libicu is more convenient as it supported by my distro and dcmtk
btw as i understand libicu supports "dicom" esc code switches as it part of ISO 2022
so... i thought libicu is more convenient as it supported by my distro and dcmtk
btw as i understand libicu supports "dicom" esc code switches as it part of ISO 2022
-
- OFFIS DICOM Team
- Posts: 318
- Joined: Mon, 2014-03-03, 09:51
- Location: Oldenburg, Germany
Re: character encoding
Thats what I meant, so we "just" have to get the existing code to not remove them before passing the strings to the ICU...Shaeto wrote:as i understand libicu supports "dicom" esc code switches as it part of ISO 2022
-
- DCMTK Developer
- Posts: 2504
- Joined: Tue, 2011-05-03, 14:38
- Location: Oldenburg, Germany
- Contact:
Re: character encoding
Does ICU support arbitrary ISO 2022 character sets without creating/opening the individual converters? E.g., is switching between ISO Latin-1 and any other character set that is defined by the DICOM standard supported, I mean by a single ICU converter?Can you explain that a bit more? As I understand the ICU handles the parsing of those escape sequences, so it would work! Were do you see the problem?
-
- OFFIS DICOM Team
- Posts: 318
- Joined: Mon, 2014-03-03, 09:51
- Location: Oldenburg, Germany
Re: character encoding
As far as I understand it the ICU does NOT allow selecting individual ISO 2022 character sets because one can only select ISO 2022 for which the ICU provides a state machine to switch between the different converters automatically/internally. Since Latin 1 is listed under ISO 2022 JP, I would expect the ICU to also handle switching between it and the others if ISO 2022 JP was selected, but we would most certainly have to test it.
-
- DCMTK Developer
- Posts: 2504
- Joined: Tue, 2011-05-03, 14:38
- Location: Oldenburg, Germany
- Contact:
Re: character encoding
OK, if that really works, the ICU support should be incorporated differently (compared to libiconv). I would propose to separate DCMTK's ICU integration from the existing one (which was originally introduced for libiconv only).
-
- OFFIS DICOM Team
- Posts: 318
- Joined: Mon, 2014-03-03, 09:51
- Location: Oldenburg, Germany
Re: character encoding
Well of course the ICU needs a different implementation in regard to the portion of the functionallity that differs. But the API (DcmSpecificCharacterSet etc.) should be unified, if not one would have to touch every portion of the code that uses it...
Who is online
Users browsing this forum: Bing [Bot] and 1 guest