Question about getUint8Array() implementation and VRs in general

All other questions regarding DCMTK

Moderator: Moderator Team

Post Reply
Message
Author
budric
Posts: 29
Joined: Thu, 2007-06-28, 20:48

Question about getUint8Array() implementation and VRs in general

#1 Post by budric »

Hi,
I need to extract a list of binary numbers from a tag that's deprecated in the standard. I'm encountering strange behavior that I was hoping someone can clarify. Here's a code snippet:

Code: Select all

//note I'm not freeing the pointers here because they point to internal DCMTK structures
DcmElement * rawElement = NULL;	 
OFCondition cond = dataset.findAndGetElement(DcmTagKey(GROUP_NUM,ELEMENT_NUM), rawElement );
if (cond.good() && rawData != NULL) 
{
    rawElement ->loadAllDataIntoMemory();
    Uint8 * byteArray = NULL;
    cond = rawElement ->getUint8Array(byteArray);
}
getUint8Array() implementation is:

Code: Select all

OFCondition DcmOtherByteOtherWord::getUint8Array(Uint8 *&byteVals)
{
    errorFlag = EC_Normal;
    if (getTag().getEVR() != EVR_OW && getTag().getEVR() != EVR_lt)
        byteVals = OFstatic_cast(Uint8 *, getValue());
    else
        errorFlag = EC_IllegalCall;
    return errorFlag;
First I want to understand where VR comes from. For some weird reason when I load the DICOM from a file, or Conquest server the VR for that tag is OB. Because of this getUint8Array works. When I load exact same object from another pacs, the VR is OW and that getUint8Array function fails. Where is VR for tag encoded? The local data dictionary? Is it exchanged during association setup? What's a good way to fix this? I can manually call DcmElement::setVR(EVR_OB). However if at any point between loading and my setVR(), DcmElement::getValue() is called and VR is anything but OB, DCMTK may rearrange the bytes according to endianness.

My last set of questions is regarding endianess. Why is DcmElement::getByteOrder() a protected method? I need this information because the data type is one of - Uint16, Sint16, Float32, Float64 - and encoded in different tag. The stored bytes may be big endian while I'm reading it on a little endian machine. Any way to determine byte order or do I have to modify DCMTK source and set that method to be public?

Thanks very much for your time.
I'm sure it's an esoteric question.

Michael Onken
DCMTK Developer
Posts: 2048
Joined: Fri, 2004-11-05, 13:47
Location: Oldenburg, Germany
Contact:

#2 Post by Michael Onken »

Hi Budric,
First I want to understand where VR comes from. For some weird reason when I load the DICOM from a file, or Conquest server the VR for that tag is OB.
Basically, the VR for an attribute is defined in the DICOM standard part 6. However, there are some attributes which may show up with different VRs, notably the attribute Pixel Data which contains the actual binary image data. Most (all?) of the elements that show up in different VRs contain binary bulk data, thus, they are either OB or OW. For Pixel Data this usually depends on the transfer syntax, i.e.: If the image is compressed then you always have OB. If not then you have OW unless Bits Allocated has a value less than or equal to 8.
My last set of questions is regarding endianess. Why is DcmElement::getByteOrder() a protected method? I need this information because the data type is one of - Uint16, Sint16, Float32, Float64 - and encoded in different tag.
The reason is that the methods to access these element's values (DcmItem::findAndGetXXX) already do the byte swapping for you, i.e. you always get the value in local byte ordering so it is not necessary for you to do anything about it.

The same is for writing; you just put values with putAndInsert() functions (for example). If you then decide to write/transfer the dataset in Big Endian Explicit VR transfer syntax (the only big endian one), then DCMTK aligns the bytes as necessary for that purpose.

Best regards,
Michael

budric
Posts: 29
Joined: Thu, 2007-06-28, 20:48

#3 Post by budric »

In my case tag1 is binary data, tag2 is integer telling you data type of the binary data:
0 = unsigned short (US) => swap 2 bytes potentially
2 = floating point single (FL) => swap 4 bytes potentially
3 = floating point double (FD) => swap 8 bytes potentially

However DCMTK will just look up the VR for that binary field (tag1), and if it's OB there's nothing to swap, and if it's OW it will swap 2 bytes if necessary to account for endianness.

So I just have to cross my fingers that the PACS supports Little Endian transfer syntax.

Do you know of a nice way to fix this in code? I.E. get DCMTK NOT to swap the bytes, and I'll do it manually. Obviously I can start hacking at the library code, but I didn't want to do that.

Thanks for your response.

Michael Onken
DCMTK Developer
Posts: 2048
Joined: Fri, 2004-11-05, 13:47
Location: Oldenburg, Germany
Contact:

#4 Post by Michael Onken »

Hi,

sorry not sure whether I totally understand.

You are right, you have to correct endianness if you use only getUint8Array() but this is not what you should do for values of datatypes US, FL and the like.

Instead, use the dedicated functions which are appropriate for the very datatype, e.g. use DcmItem::findAndGetFloat32() for an FL value, or use DcmItem::findAndGetUint16(). Then you get the parsed content of that field,i.e. the value in local endianness.

For binary OW data you can use DcmItem::findAndGetUint16() instead, which also gives you the content (16 Bit values) in local endianness.

Nevertheless, if you want to know the transfer syntax in which the file was read, use DcmDataset's getOriginalXfer() function.

Did I miss your point? :-)

Michael

budric
Posts: 29
Joined: Thu, 2007-06-28, 20:48

#5 Post by budric »

Sort of.

The data is an array of Float32 or Uint16 etc, so I'd use DcmItem::findAndGetFloat32Array(). The implementation in DCMTK is:

Code: Select all

OFCondition DcmItem::findAndGetFloat32Array(const DcmTagKey& tagKey,
                                            const Float32 *&value,
                                            unsigned long *count,
                                            const OFBool searchIntoSub)
{
    DcmElement *elem;
    /* find the element */
    OFCondition status = findAndGetElement(tagKey, elem, searchIntoSub);
    if (status.good())
    {
        /* get the value */
        Float32 *array = NULL;
        status = elem->getFloat32Array(array);
        value = array;
    }
    //snip, rest of code not shown
So basically it just calls findAndGetElement() followed by getFloat32Array(). Note that's what I was doing in the code I posted in my first post. The problem is that DcmElement returned by findAndGetElement() is DcmOtherByteOtherWord. This class does not have a valid getFloat32Array() - the implementation just returns Illegal call condition. The only functions defined and work are: getUint8Array() and getUint16Array(). However, getUint16Array() only works when getEVR() returns EVR_OW, and getUint8Array() works when getEVR() does NOT return EVR_OW.

To make things even worse the VR type is not correct or doesn't apply. When parsed it's always either OB or OW. However the standard defines the data type to depend on yet another tag which is 1 of 4 integers. So the reason I know that I have to call getFloat32Array() at all, is because of a totally different tag whose value is 3 which happens to mean Float32.

As I mentioned this whole module is deprecated - last mention of it is in 2004, but I still have to parse it.

Thanks very much for your time/help. To fix this I will go with preferred Little Endian transfer syntax (let PACS worry about conversion if necessary), and have a look at getOriginalXfer() and do my own conversion if necessary.

Michael Onken
DCMTK Developer
Posts: 2048
Joined: Fri, 2004-11-05, 13:47
Location: Oldenburg, Germany
Contact:

#6 Post by Michael Onken »

Hi,

thanks for clarifying and reminding me on the original problem (which actually helped).

Can you tell me which deprecated tag that actually is? If it was standard and there are files outside like that, we should adapt the toolkit accordingly.

Those polymorph tags which can have different data types are always a pain in the back since they need extra code (not to mention incorrect implementations that write those tags with wrong data types).

Michael

P.S: Fortunately, as a current fall back, Little Endian Explicit is usually pretty good supported by PACS (since they often can do uncompressed and nobody actaully likes Implicit VR too much).

budric
Posts: 29
Joined: Thu, 2007-06-28, 20:48

#7 Post by budric »

This is C.10.2 Curve module. Page 688, Part 3: ftp://medical.nema.org/medical/dicom/20 ... 4_03_2.pdf

The Data Representation Tag is (50xx,0103), it's enumerated type. The Curve Data tag is (50xx,3000) - that's the binary array.

FYI, it took me a while to figure out what 50xx actually mean - I've never seen this in 2009 standard. This is a way to encode multiple elements without putting them under a sequence tag. So curve 1 is 5000, curve 2 is 5002, curve 3 is 5004 etc.

*Edit*
I also just read part 6, and it says
(50xx,3000) Curve Data OW or OB 1
So ... now I'm confused how floats and doubles are supposed to work between Big/Little endian machines. Also how is that "OR" resolved, which one is it in the end, and how is that specified? In the file? Association establishment?

Michael Onken
DCMTK Developer
Posts: 2048
Joined: Fri, 2004-11-05, 13:47
Location: Oldenburg, Germany
Contact:

#8 Post by Michael Onken »

Hi,

In part 5 of the standard it is described for each transfer syntax how to encode Curve Data.

For Little Endian Implicit:
Data Element (50xx,3000) Curve Data has the Value Representation OB with its component points (n-tuples) having the Value Representation specified in Data Value Representation (50xx,0103). The component points shall be encoded in Little Endian.
For Little Endian Explicit:
Data Element (50xx,3000) Curve Data has the Value Representation specified in its Explicit VR Field. See the specification of the Curve Data Module in PS 3.3 for the enumerated list of allowable VRs. The component points shall be encoded in Little Endian.
For Big Endian Explicit
Data Element (50xx,3000) Curve Data has the Value Representation specified in its Explicit VR Field. See the specification of the Curve Data Module in PS 3.3 for the enumerated list of allowable VRs. The component points shall be encoded in Big Endian.
As far as I know, anything other than 8 or 16 Bit value types are very uncommon for Curve Data; you mostly see US. However, I do not understand myself why the other possibilities were not mentioned/corrected in part 6 of the standard, but it was a recognized problem and more or less ignored, also by the editor of the DICOM standard itself, David Clunie.

In DCMTK, we do not look into Curve Value Representation at all. Then you as an API user must interpret the content according to Data Value Representation, so the main work is actually up to you.

Sorry for not-so-good news :wink:

Michael

[Edited 10:57]

Michael Onken
DCMTK Developer
Posts: 2048
Joined: Fri, 2004-11-05, 13:47
Location: Oldenburg, Germany
Contact:

#9 Post by Michael Onken »

P.S:

For Little Endian Implicit we have a routine in DcmItem that checks and updates the VR according to our best knowledge based on the standard; it is called (guess what) checkAndUpdateVR() and is called for each element when reading Implicit. This also handles elements like Pixel Data, Overlay Data, and also, Curve Data.

So far, we are very lazy and always read data internally into OB.
You may want to extend that routine if you like and send us a patch. There may be side effects when you write the data after changing the VR which I did not check.

Michael

budric
Posts: 29
Joined: Thu, 2007-06-28, 20:48

#10 Post by budric »

Hi,
thanks for that quote. Put things into new light. I've been playing around with the following patch to dcitem.cc


Code: Select all

else if ((tag.getGroup() & 0xFF00) == 0x5000 && tag.getElement() == 0x3000 && (tag.getEVR() == EVR_ox))
    {
        /* case 4 (CurveData): see section A.1 in PS 3.5-2004 
		   DCM_RETIRED_CurveData group value is a range 0x5000 - 0x50FF 
		   Note on data type from DICOM spec
			0000H = unsigned short (US)
			0001H = signed short (SS)
			0002H = floating point single (FL)
			0003H = floating point double (FD)
			0004H = signed long (SL)
		*/
		Uint16 dataRep;
        if (item.findAndGetUint16(DCM_RETIRED_DataValueRepresentation, dataRep).good())
        {
			switch (dataRep) 
			{
				case 0x0:
					DCMDATA_DEBUG("Setting undefined VR of " << tag.getTagName() << " " << tag << " to 'US'");
					tag.setVR(EVR_US);
					break;
				case 0x1:
					DCMDATA_DEBUG("Setting undefined VR of " << tag.getTagName() << " " << tag << " to 'SS'");
					tag.setVR(EVR_SS);
					break;
				case 0x2:
					DCMDATA_DEBUG("Setting undefined VR of " << tag.getTagName() << " " << tag << " to 'FL'");
					tag.setVR(EVR_FL);
					break;
				case 0x3:
					DCMDATA_DEBUG("Setting undefined VR of " << tag.getTagName() << " " << tag << " to 'FD'");
					tag.setVR(EVR_FD);
					break;
				case 0x4:
					DCMDATA_DEBUG("Setting undefined VR of " << tag.getTagName() << " " << tag << " to 'SL'");
					tag.setVR(EVR_SL);
					break;
				default:
					DCMDATA_DEBUG("Invalid DCM_RETIRED_DataValueRepresentation value.  Setting " << tag.getTagName() << " " << tag << " to 'OB'");
					tag.setVR(EVR_OB);
			}
		}
		else
		{
			DCMDATA_DEBUG("setting undefined VR of " << tag.getTagName() << " " << tag << " to 'OB'");
			tag.setVR(EVR_OB);
		}
    }
The code replaces an if statment which matches tag == DCM_RETIRED_CurveData. This isn't enough since you can have multiple ones over 0x50xx.

Also ideally I'd really like to remove tag.getEVR() == EVR_ox condition so it fixes the VR for all cases, not just Implicit Little Endian where VR is not specified in the data stream. A PACS I'm using is actually sending that field as OW instead of FD as it should be for that instance. But that wouldn't be to the standard, so I don't know if you'll have that in the trunk.

So far it looks good, findAndGetFloat64Array() works. I'll do some more testing.

I'm not sure what you mean by "write data after changing VR". Write it to disk, or use one of the putFloat64Array methods?

J. Riesmeier
DCMTK Developer
Posts: 2501
Joined: Tue, 2011-05-03, 14:38
Location: Oldenburg, Germany
Contact:

#11 Post by J. Riesmeier »

The code replaces an if statment which matches tag == DCM_RETIRED_CurveData. This isn't enough since you can have multiple ones over 0x50xx.
Thank you for pointing this out. This issue has been fixed now.

Michael Onken
DCMTK Developer
Posts: 2048
Joined: Fri, 2004-11-05, 13:47
Location: Oldenburg, Germany
Contact:

#12 Post by Michael Onken »

Hi,

thanks for the patch which is what I also had in mind for solving this issue. We will check it later and add the code correspondingly within the next days. We will then post an update here.

Michael

Michael Onken
DCMTK Developer
Posts: 2048
Joined: Fri, 2004-11-05, 13:47
Location: Oldenburg, Germany
Contact:

#13 Post by Michael Onken »

Just a short note: By accident I have just read that there is a DICOM Correction Proposal which addresses the VR issue regarding Curve Data(not balloted yet). It will correct it the same way as we interpret it (i.e. VR may be any of those mentioned in part 3).

Michael

budric
Posts: 29
Joined: Thu, 2007-06-28, 20:48

#14 Post by budric »

Thanks. Looking at the dates: 1999 in pdf, and 99, 02 timestamps on the FTP . I'm not holding my breath =).

But at least it confirms our correct understanding. Thanks again for finding those bits in the standard.

Michael Onken
DCMTK Developer
Posts: 2048
Joined: Fri, 2004-11-05, 13:47
Location: Oldenburg, Germany
Contact:

#15 Post by Michael Onken »

Hi,

yes, the CP is very old, I know - interestingly it is still open and was not cancelled. But as you say, this is just a hint of what the "correct" interpretation is, so I found it interesting to share.

Michael

Post Reply

Who is online

Users browsing this forum: Bing [Bot] and 1 guest