DICOM @ OFFIS

Discussion Forum for OFFIS DICOM Tools - For registration, send email with desired user name to the OFFIS DICOM team
It is currently Sat, 2017-08-19, 04:34

All times are UTC + 1 hour




Post new topic Reply to topic  [ 5 posts ] 
Author Message
PostPosted: Mon, 2014-02-24, 08:41 
Offline

Joined: Fri, 2014-02-21, 11:06
Posts: 3
Hi,

I made a small toolchain with dcm2xml and xml2dcm. I have a data set containing the study description "Wirbelsäule" (English: spine). dcm2xml is able to generate a proper xml file but xml2dcm after that crashes. In detail:

Code:
xml2dcm.exe --log-level debug a.xml a2.dcm
D: $dcmtk: xml2dcm v3.6.0 2011-01-06 $
D:
I: reading XML input file: a.xml
--- libxml parsing ------
a.xml:34: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xE4 0x75 0x6C 0x65
<element tag="0008,1030" vr="LO" vm="1" len="30" name="StudyDescription">Wirbels


I can search for umlauts and replace them as workaround but it is maybe a general problem. Converting the original a.dcm to a.xml I also receive warning by dcm2xml:
Code:
W: (0008,0005) Specific Character Set 'ISO 2022 IR 100' not supported


Another problem I have (I was not able to find a suitable solution here): some DICOM entries have a trailing 0x10. This is converted to
Code:
&amp;#10;
by dcm2xml and not converted back by xml2dcm. This also happens on line feeds with 0x13 0x10.

Thank you for help!


Top
 Profile  
 
PostPosted: Mon, 2014-02-24, 09:21 
Offline
DCMTK Developer

Joined: Tue, 2011-05-03, 14:38
Posts: 1834
Location: Oldenburg, Germany
Quote:
dcm2xml is able to generate a proper xml file but xml2dcm after that crashes.

Do you really mean "crashes" or does the tool just exit/terminate with an error?

Regarding your "umlaut" problem: This is probably caused by a wrong character set encoding. Are you sure that the XML encoding of the "ä" is correct?

Quote:
Converting the original a.dcm to a.xml I also receive warning by dcm2xml:
Code:
W: (0008,0005) Specific Character Set 'ISO 2022 IR 100' not supported

Right, this encoding (ISO 2022 switching of multiple character sets) is not supported by dcm2xml (as you can read in the documentation).
However, you might want to try the latest snapshot of this tool with option +U8 (--convert-to-utf8)...

Quote:
Another problem I have (I was not able to find a suitable solution here): some DICOM entries have a trailing 0x10. This is converted to
Code:
&amp;#10;

by dcm2xml and not converted back by xml2dcm. This also happens on line feeds with 0x13 0x10.

I have to check this. Actually, this conversion should be done by the underlying XML library (libxml2).


Top
 Profile  
 
PostPosted: Mon, 2014-02-24, 09:30 
Offline
DCMTK Developer

Joined: Tue, 2011-05-03, 14:38
Posts: 1834
Location: Oldenburg, Germany
I checked the newline issue with the latest snapshot: LF and CR are correctly converted to "&#10;" and "&#13;", and back to LF and CR.
Your sequence "&amp;#10;" is incorrect by the way, it should be "&#10;". As far as I can see, dcm2xml 3.6.0 also generates the correct output...


Top
 Profile  
 
PostPosted: Mon, 2014-02-24, 10:23 
Offline

Joined: Fri, 2014-02-21, 11:06
Posts: 3
Thank you for fast reply!

Quote:
Do you really mean "crashes" or does the tool just exist/terminate with an error?

Sorry for this inaccurate description. It terminates with error:
Code:
xml2dcm.exe a.xml a2.dcm
E: could not parse document: a.xml

(all files are in the same folder and converting other xml files works fine)
Running xml2dcm.exe with debug logging, the output is written in the first posting.

Quote:
Are you sure that the XML encoding of the "ä" is correct?

What is the correct encoding? If I open the XML file with my notepad2 (mod) with ANSI 1252 encoding I can see an "ä". The file was not modified in any way.

Quote:
However, you might want to try the latest snapshot of this tool with option +U8 (--convert-to-utf8)...

I will test it.

Quote:
Your sequence "&amp;#10;" is incorrect by the way, it should be "&#10;".

Ok... I found it. It is not a problem of dcmtk. I checked the generated vanilla xml file and everything is fine but parsing the xml file in my code the ampersands are added. RapidXML seems to add it while parsing the XML. Sorry for the false negative!!!


Top
 Profile  
 
PostPosted: Mon, 2014-02-24, 13:23 
Offline
DCMTK Developer

Joined: Tue, 2011-05-03, 14:38
Posts: 1834
Location: Oldenburg, Germany
Quote:
What is the correct encoding? If I open the XML file with my notepad2 (mod) with ANSI 1252 encoding I can see an "ä". The file was not modified in any way.

ANSI 1252 is a Windows-specific character set (similar but not identical to ISO 8859-1 as far as I know).
The prolog of an XML document usually specifies the encoding, e.g. "ISO-8859-1" or "UTF-8".

The log output in your above posting indicates that the XML encoding is declared as UTF-8, which contradicts your statement that it is ANSI 1252 (Windows).


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group