FINDSCU encoding of result files doesnt match

Message

CRIno · #1 Post by **CRIno** » Mon, 2020-09-07, 12:08

Hi,

when using findscu of dcmtk v3.6.5 the result xml files got the header 'encoding="UTF-8"', but the file itself is beeing created with ANSI encoding.

Any ideas on that one?

Kind regards

J. Riesmeier · #2 Post by **J. Riesmeier** » Mon, 2020-09-07, 14:39

What is the value of Specific Character Set (0008,0005)? What do you mean by "the file itself is beeing created with ANSI encoding"?

CRIno · #3 Post by **CRIno** » Thu, 2020-09-10, 10:03

Opening the result file with notepad shows "ANSI" in the statusbar.
Opening the file with other editors able to switch encodings (e.g. notepad++) messes up characters like äöü when using UTF-8 encoding (so the file content is clearly not UTF-8 encoded). Switching to ANSI displays special characters correctly.
Opening the file with .NET System.Xml.XmlDocument throws the error that the specified header encoding does not match actual the file encoding.

The input.xml (which is converted with xml2dcm before sending through findscu) uses "iso-8859-1" in the Xml header and writes the corresponding "ISO_IR 100" to 0008,0005.

Code: Select all

<?xml version="1.0" encoding="iso-8859-1"?>
<file-format>
  <meta-header xfer="1.2.840.10008.1.2.1" name="Little Endian Explicit">
    <element tag="0002,0010" vr="UI" len="19" name="TransferSyntaxUID">1.2.840.10008.1.2.1</element>
    <element tag="0002,0012" vr="UI" len="26" name="ImplementationClassUID">...</element>
    <element tag="0002,0013" vr="SH" len="11" name="ImplementationVersionName">...</element>
  </meta-header>
  <data-set xfer="1.2.840.10008.1.2.1" name="Little Endian Explicit">
    <element tag="0008,0005" vr="CS" len="10" name="SpecificCharacterSet">ISO_IR 100</element>
    <sequence tag="0040,0100" vr="SQ" name="ScheduledProcedureStepSequence">
      <item>
        <element tag="0040,0001" vr="AE" len="7">PDC-SCU</element>
        <element tag="0008,0060" vr="CS" len="3">ECG</element>
        <element tag="0040,0010" vr="SH" len="0"></element>
      </item>
    </sequence>
    <element tag="0010,0010" vr="PN" len="0"></element>
    <element tag="0010,0020" vr="LO" len="9">000000003</element>
    <element tag="0010,0030" vr="DA" len="0"></element>
    <element tag="0010,0040" vr="CS" len="0"></element>
    <element tag="0008,0050" vr="SH" len="0"></element>
  </data-set>
</file-format>

The findscu result looks like this. UTF-8 set as header encoding, but UTF-8 chars not showing up correctly (see patient name).

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<responses type="C-FIND">
<data-set xfer="1.2.840.10008.1.2" name="Little Endian Implicit">
<element tag="0008,0050" vr="SH" vm="1" len="6" name="AccessionNumber">acc no</element>
<element tag="0010,0010" vr="PN" vm="1" len="18" name="PatientName">Sch䦥r</element>
<element tag="0010,0020" vr="LO" vm="1" len="10" name="PatientID">000000003</element>
<element tag="0010,0030" vr="DA" vm="1" len="8" name="PatientBirthDate">19000101</element>
<element tag="0010,0040" vr="CS" vm="1" len="2" name="PatientSex">M</element>
<sequence tag="0040,0100" vr="SQ" card="1" name="ScheduledProcedureStepSequence">
<item card="2">
<element tag="0040,0002" vr="DA" vm="1" len="8" name="ScheduledProcedureStepStartDate">20200910</element>
<element tag="0040,0003" vr="TM" vm="1" len="6" name="ScheduledProcedureStepStartTime">101140</element>
</item>
</sequence>
</data-set>
</responses>

findscu call is:

Code: Select all

findscu.exe -Xx -to 30 -ta 30 -td 30 -Xs Output.xml -od <target path> <server> -aet PDC-SCU -aec DVTK_MW_SCP Input.dcm -ll debug

J. Riesmeier · #4 Post by **J. Riesmeier** » Thu, 2020-09-10, 10:33

To me it seems that the SCP is sending a C-FIND Response dataset with non-ASCII characters without specifying the character set that was used in (0008,0005) Specific Character Set. This is not valid according to the DICOM standard. That means, the findscu assumes ASCII encoding and, therefore, does not perform any character set conversion. The "UTF-8" in the XML header is a result of the -Xs (--extract-xml-single) option you've used.

CRIno · #5 Post by **CRIno** » Thu, 2020-09-10, 10:46

The SCP is DVTk RIS Emulator 5.0.0

So the SCP should include the 0008,0005 in the response (which seems like it is not) and then findscu would set the xml header for the output xml correctly?

J. Riesmeier · #6 Post by **J. Riesmeier** » Thu, 2020-09-10, 11:01

If the SCP would correctly specify the character set used, e.g. (0008,0005) Specific Character Set = "ISO_IR 100", then findscu with option -Xs (and compiled with support for character set conversion) would convert the Latin-1 (ISO 8859-1) encoding to UTF-8. The encoding in the XML header is always "UTF-8" when using option -Xs (see manpage of this tool for details).

CRIno · #7 Post by **CRIno** » Thu, 2020-09-10, 11:37

Ok, Thank you very much!

DICOM @ OFFIS

FINDSCU encoding of result files doesnt match

FINDSCU encoding of result files doesnt match

Re: FINDSCU encoding of result files doesnt match

Re: FINDSCU encoding of result files doesnt match

Re: FINDSCU encoding of result files doesnt match

Re: FINDSCU encoding of result files doesnt match

Re: FINDSCU encoding of result files doesnt match

Re: FINDSCU encoding of result files doesnt match

Who is online