fatal error in vrscan::yy_get_next_buffer (dcmdata)

All other questions regarding DCMTK

Moderator: Moderator Team

Post Reply
Message
Author
Yves Neumann
Posts: 30
Joined: Fri, 2005-12-02, 17:06
Location: Germany

fatal error in vrscan::yy_get_next_buffer (dcmdata)

#1 Post by Yves Neumann »

Hello OFFIS-Team,

I am seeing a strange behavior converting a SR to XML. The SR contains an UT attribute (see mark in output). The 'vrscan::yylex()' call in 'DcmElement::scanValue()' results in an fatal error within the vrscan::yy_get_next_buffer() and therefore in a call to 'exit(1)'.

Using the dsr2xml shows the following output:

C:\dcmtk-3.5.5_20100608\dcmsr\apps\Debug>dsr2xml.exe -d SR000001
D: $dcmtk: dsr2xml v3.5.5 2010-06-08 $
D:
D: DcmItem::checkTransferSyntax() TransferSyntax="Little Endian Explicit"
W: InstanceNumber (0020,0013) empty in SR document (type 1)
fatal error - scanner input buffer overflow

I will send a copy of on of the SR to your support email. I'd be thankful if one of you could have a look on it. Actually, the functionality is supposed to be used in an application but the exit(1) can not be catched and causes the app to terminate which is not acceptable. The dsr2xml is just used to reproduce the behavior. My code uses the same snapshot the dsr2xml is built of.

Thanks in advance, Yves

//-----------------------------------------------------------

A dump of a SR causing the problem:

# Dicom-File-Format

# Dicom-Meta-Information-Header
# Used TransferSyntax: Little Endian Explicit
(0002,0000) UL 222 # 4, 1 FileMetaInformationGroupLen
(0002,0001) OB 00\01 # 2, 1 FileMetaInformationVersion
(0002,0002) UI =BasicTextSR # 30, 1 MediaStorageSOPClassUID
(0002,0003) UI [1.2.392.200036.9116.7.8.6.30526642.2.0.86121226962446] # 54, 1 MediaStorage
(0002,0010) UI =LittleEndianExplicit # 20, 1 TransferSyntaxUID
(0002,0012) UI [1.2.392.200036.9116.7.8.10.46.6.1.1.1] # 38, 1 ImplementationClassUID
(0002,0013) SH [TM_APLIO_1.0] # 12, 1 ImplementationVersionName
(0002,0016) AE [RMEDIA] # 6, 1 SourceApplicationEntityTitl

# Dicom-Data-Set
# Used TransferSyntax: Little Endian Explicit
(0008,0005) CS [ISO_IR 100] # 10, 1 SpecificCharacterSet
(0008,0012) DA [20100426] # 8, 1 InstanceCreationDate
(0008,0013) TM [090244.375000] # 14, 1 InstanceCreationTime
(0008,0016) UI =BasicTextSR # 30, 1 SOPClassUID
(0008,0018) UI [1.2.392.200036.9116.7.8.6.30526642.2.0.86121226962446] # 54, 1 SOPInstanceU
(0008,0020) DA [20100426] # 8, 1 StudyDate
(0008,0022) DA [20100426] # 8, 1 AcquisitionDate
(0008,0023) DA [20100426] # 8, 1 ContentDate
(0008,0030) TM [084702.687000] # 14, 1 StudyTime
(0008,0032) TM [090244.375000] # 14, 1 AcquisitionTime
(0008,0033) TM [090244.375000] # 14, 1 ContentTime
(0008,0050) SH (no value available) # 0, 0 AccessionNumber
(0008,0060) CS [SR] # 2, 1 Modality
(0008,0070) LO [TOSHIBA_MEC] # 12, 1 Manufacturer
(0008,0080) LO [TOSHIBA] # 8, 1 InstitutionName
(0008,0090) PN [REFPHYSICIAN] # 12, 1 ReferringPhysiciansName
(0008,1030) LO [OB] # 2, 1 StudyDescription
(0008,1040) LO [DEPARTMENT] # 10, 1 InstitutionalDepartmentName
(0008,1080) LO (no value available) # 0, 0 AdmittingDiagnosesDescripti
(0008,1090) LO [Xario] # 6, 1 ManufacturersModelName
(0008,1111) SQ (Sequence with explicit length #=0) # 0, 1 ReferencedPerformedProcedur
(fffe,e0dd) na (SequenceDelimitationItem for re-encod.) # 0, 0 SequenceDelimitationItem
(0010,0010) PN [JUST A NAME] # 10, 1 PatientsName
(0010,0020) LO [1234567] # 8, 1 PatientID
(0010,0030) DA [19000101] # 8, 1 PatientsBirthDate
(0010,0040) CS [F] # 2, 1 PatientsSex
(0010,1020) DS [0] # 2, 1 PatientsSize
(0010,1030) DS [0] # 2, 1 PatientsWeight
(0010,1040) LO (no value available) # 0, 0 PatientsAddress
(0010,4000) LT [Insurance=
] # 12, 1 PatientComments
(0018,1000) LO [30526642] # 8, 1 DeviceSerialNumber
(0018,1020) LO [V10.00] # 6, 1 SoftwareVersions
(0020,000d) UI [1.2.392.200036.9116.7.8.6.30526642.5.0.83970623281682] # 54, 1 StudyInstanc
(0020,000e) UI [1.2.392.200036.9116.7.8.6.30526642.5.0.83970957052814] # 54, 1 SeriesInstan
(0020,0010) SH [1] # 2, 1 StudyID
(0020,0011) IS [1] # 2, 1 SeriesNumber
(0020,0013) IS (no value available) # 0, 0 InstanceNumber
(0029,0010) LO [TOSHIBA MDW NON-IMAGE] # 22, 1 PrivateCreator
(0029,0011) LO [PMTF INFORMATION DATA] # 22, 1 PrivateCreator
(0029,0012) LO [TOSHIBA COMAPL HEADER] # 22, 1 PrivateCreator
(0029,1008) CS [TSB_BASIC_SR] # 12, 1 Unknown Tag & Data
(0029,1009) LO [1.00] # 4, 1 Unknown Tag & Data
(0029,1020) OB 08\00\60\00\43\53\02\00\55\53\08\00\50\10\50\4e\0e\00\50\45\52\46... # 48, 1
(0029,1131) LO [5.0.907296] # 10, 1 Unknown Tag & Data
(0029,1132) UL 107702 # 4, 1 Unknown Tag & Data
(0029,1133) UL 0 # 4, 1 Unknown Tag & Data
(0029,1134) CS [DB TO DICOM] # 12, 1 Unknown Tag & Data
(0029,1220) OB 4d\00\45\00\44\00\43\00\4f\00\4d\00\20\00\48\00\49\00\53\00\54\00... # 132, 1
(0032,4000) LT (no value available) # 0, 0 RETIRED_StudyComments
(0040,a040) CS [CONTAINER] # 10, 1 ValueType
(0040,a043) SQ (Sequence with explicit length #=1) # 64, 1 ConceptNameCodeSequence
(fffe,e000) na (Item with explicit length #=3) # 56, 1 Item
(0008,0100) SH [V5000001] # 8, 1 CodeValue
(0008,0102) SH [TSBUS] # 6, 1 CodingSchemeDesignator
(0008,0104) LO [APLIO_BASIC_REPORT] # 18, 1 CodeMeaning
(fffe,e00d) na (ItemDelimitationItem for re-encoding) # 0, 0 ItemDelimitationItem
(fffe,e0dd) na (SequenceDelimitationItem for re-encod.) # 0, 0 SequenceDelimitationItem
(0040,a050) CS [SEPARATE] # 8, 1 ContinuityOfContent
(0040,a372) SQ (Sequence with explicit length #=0) # 0, 1 PerformedProcedureCodeSeque
(fffe,e0dd) na (SequenceDelimitationItem for re-encod.) # 0, 0 SequenceDelimitationItem
(0040,a491) CS [COMPLETE] # 8, 1 CompletionFlag
(0040,a493) CS [UNVERIFIED] # 10, 1 VerificationFlag
(0040,a730) SQ (Sequence with explicit length #=1) # 107556, 1 ContentSequence
(fffe,e000) na (Item with explicit length #=4) # 107548, 1 Item
(0040,a010) CS [CONTAINS] # 8, 1 RelationshipType
(0040,a040) CS [TEXT] # 4, 1 ValueType
(0040,a043) SQ (Sequence with explicit length #=1) # 64, 1 ConceptNameCodeSequence
(fffe,e000) na (Item with explicit length #=3) # 56, 1 Item
(0008,0100) SH [V5000002] # 8, 1 CodeValue
(0008,0102) SH [TSBUS] # 6, 1 CodingSchemeDesigna
(0008,0104) LO [ORIGINAL_XML_DATA] # 18, 1 CodeMeaning
(fffe,e00d) na (ItemDelimitationItem for re-encoding) # 0, 0 ItemDelimitationItem
(fffe,e0dd) na (SequenceDelimitationItem for re-encod.) # 0, 0 SequenceDelimitationIte
(0040,a160) UT [<?xml version="1.0" encoding="UTF-8"?>
<Report><App Name="OB"><Gr... # 107432, 1 TextValue !!!! fatal error - scanner input buffer overflow --> exit(1);
(fffe,e00d) na (ItemDelimitationItem for re-encoding) # 0, 0 ItemDelimitationItem
(fffe,e0dd) na (SequenceDelimitationItem for re-encod.) # 0, 0 SequenceDelimitationItem

Jörg Riesmeier
ICSMED DICOM Services
ICSMED DICOM Services
Posts: 2217
Joined: Fri, 2004-10-29, 21:38
Location: Oldenburg, Germany

#2 Post by Jörg Riesmeier »

I can confirm this unexpected behavior in the current snapshot - thank you for the report.

The fatal error is caused by the DSRTypes::checkElementValue() call to the very long UT data element. So there are various solutions to solve this issue ...

1) Fix the buffer overflow problem in the automatically generated VR scanner code (dcmdata).
2) Do not check the value of very long data elements at all (dcmsr).
3) Introduce a new read flag which allows for disabling the value checking (dcmsr).

We'll see ...

Uli Schlachter
DCMTK Developer
Posts: 120
Joined: Thu, 2009-11-26, 08:15

#3 Post by Uli Schlachter »

This is fixed in the latest git version. Instead of using flex++ we are now using "standard" flex. This automatically fixed the bug since flex grows its buffer if needed.

http://git.dcmtk.org/web?p=dcmtk.git;a= ... 6c66321eee
http://git.dcmtk.org/web?p=dcmtk.git;a= ... ccf0cd2736

The underlaying problem (error handling via exit()) will be fixed shortly. We will be using setjmp() / longjmp() for this.

Thanks for the report and thanks for flying DCMTK.

Yves Neumann
Posts: 30
Joined: Fri, 2005-12-02, 17:06
Location: Germany

#4 Post by Yves Neumann »

Hello,

I can confirm that the buffer issue seems to be fixed with my samples.

Nevertheless I still have various SR datasets containing UT attributes that cause the vrscanner to fail.

Using dsr2html it gives the following output:

K:\DICOM\Datasets\SR\SRs>dsr2html.exe -d -Ee 00000002 00000002.htm
D: $dcmtk: dsr2html v3.5.5 2010-10-08 $
D:
D: DcmItem::checkTransferSyntax() TransferSyntax="Little Endian Explicit"
W: ReferencedPerformedProcedureStepSequence (0008,1111) absent in SRDocumentSeriesModule (type 2)
W: PerformedProcedureCodeSequence (0040,a372) absent in SRDocumentGeneralModule (type 2)
W: TextValue (0040,a160) violates VR definition in content item
W: Reading invalid/incomplete content item TEXT "1.3.1"
W: Rendering invalid/incomplete content item TEXT

As far as I can say the VR check fails in 'vrscan.cc' around line 93. But I am not sure (still do not understand the vrscanner completely :( ).

Code: Select all

result = yylex(scanner);       // yylex() returns '14' 
if (yylex(scanner))            // yylex() returns '16'
    result = 16 /* UNKNOWN */;
I will send you a sample dataset to reproduce the issue. I know the sample SR contains several encoding issues (mostly missing data) but that is what we have to deal with :cry: - even from big vendors.

As mentioned in one former post by Joerg I disabled the value check for VR "UT" in DSRTypes at all to be able to render such SRs. I am not so happy about that solution but it works for now.

Uli Schlachter
DCMTK Developer
Posts: 120
Joined: Thu, 2009-11-26, 08:15

#5 Post by Uli Schlachter »

Yves Neumann wrote:As far as I can say the VR check fails in 'vrscan.cc' around line 93. But I am not sure (still do not understand the vrscanner completely :( ).

Code: Select all

result = yylex(scanner);       // yylex() returns '14' 
if (yylex(scanner))            // yylex() returns '16'
    result = 16 /* UNKNOWN */;
The second call will scan the rest of the buffer. Normally, the first call should have consumed the whole input and thus the second call will return YY_NULL (which has the value 0). However, when the first call stopped before the end of the string (e.g. because it saw an invalid character), the second call will return 16 (=UNKNOWN).
This little magic just makes sure that invalid inputs are rejected correctly.
Yves Neumann wrote: I will send you a sample dataset to reproduce the issue. I know the sample SR contains several encoding issues (mostly missing data) but that is what we have to deal with :cry: - even from big vendors.

As mentioned in one former post by Joerg I disabled the value check for VR "UT" in DSRTypes at all to be able to render such SRs. I am not so happy about that solution but it works for now.
I took a look at the file and I have to say it looks valid. The TextValue that causes the vrscanner to fail contains accents (é), but (0008, 0005) SpecificCharacterSet has the value "ISO_IR 100".
The problem is that the vrscanner ignores the character set and always assumes the default character reportoire which makes this text invalid.

I'll add an item to our bug list and will see what I can do.

Uli Schlachter
DCMTK Developer
Posts: 120
Joined: Thu, 2009-11-26, 08:15

#6 Post by Uli Schlachter »

Turns out I am wrong (thanks, Jörg). The VR scanner allows a large enough byte range that it doesn't have to look at the SpecificCharacterSet. è and é (0xe8 and 0xe9) are allowed.

The character that causes the problem has the value decimal 156, octal 0234, hex 0x9c (my text editor automatically converted this invalid byte into <9c> for display):
Chère cons<9c>ur, Cher confrère,
If I interpret the ISO-8859-1 table at wikipedia correctly, then this value is not assigned.

So sorry, but I think the VR scanner actually does the right thing here.

Yves Neumann
Posts: 30
Joined: Fri, 2005-12-02, 17:06
Location: Germany

#7 Post by Yves Neumann »

Thanks for the analysis.

So, knowing the data are containing 'bad' characters I still have the problem that the whole value is not rendered. When the test is disabled (as I have it for now) the result is ok.

Actually, it would be nice to have a flag or option to control if a failing VR-test of the value is causing the value to be discarded from being rendered. I don't like to change the dcmtk code since I have to maintain changes with every new snapshot that I use.

Jörg Riesmeier
ICSMED DICOM Services
ICSMED DICOM Services
Posts: 2217
Joined: Fri, 2004-10-29, 21:38
Location: Oldenburg, Germany

#8 Post by Jörg Riesmeier »

Ok, we will add this issue to our to-do list (i.e. adding a new flag for the HTML rendering). However, since we have already passed "feature freeze", this addition will not make it into the upcoming 3.6.0 release.

Jörg Riesmeier
ICSMED DICOM Services
ICSMED DICOM Services
Posts: 2217
Joined: Fri, 2004-10-29, 21:38
Location: Oldenburg, Germany

#9 Post by Jörg Riesmeier »

This issue has been solved with the following commit.

Post Reply

Who is online

Users browsing this forum: Bing [Bot] and 1 guest