Fast parsing of a list of dicom tags and values to load into a SQL database

All other questions regarding DCMTK

Moderator: Moderator Team

Post Reply
Message
Author
Jonas Teuwen
Posts: 1
Joined: Sat, 2020-04-11, 16:14

Fast parsing of a list of dicom tags and values to load into a SQL database

#1 Post by Jonas Teuwen »

Hello all,

To be able to parse a very large dataset of DICOM images, where I would be able to query and find a PET scan matching a certain RT treatment scan, I would like to parse certain dicom tags (about 30, only top level) into an SQL database. As my understanding of DCMTK does not reach that far yet, I wrote a function like this:

Code: Select all

int ParseDicom(const char *filename, std::map<DcmTagKey, std::string> &dicom_metadata) {
    DcmFileFormat file_format;
    OFCondition status = file_format.loadFile(filename);
    if (status.bad()) {
        return 1;
    }

    DcmDataset *dataset = file_format.getDataset();

    for (const auto &dicom_tag_group : dicom_tags) {
        for (const auto &dicom_tag : dicom_tag_group) {
            OFCondition condition;
            OFString value;
            condition = dataset->findAndGetOFStringArray(dicom_tag, value);
            if (condition.good()) {
                dicom_metadata[dicom_tag] = value.c_str();
            } else {
                dicom_metadata[dicom_tag] = "";
            }
        }
    }
    return 0;
}
Still, the code is rather slow, and after profiling I found that this is the function which takes most time, and I am looking to optimise the lookup. I assume that for each tag, it will completely parse through the tree? If I understand this properly, I would be able to 1) parse through the top level tags 2) check if these are in a dataset (perhaps even sort these tags, so I can avoid to find some missing tags) and add them if needed, and stop when the last tag has been found or PixelData has been reached. If so, do I need to write something like this (based on code elsewhere on the forum):

Code: Select all

    OFCondition status = file_format.loadFile(filename);
    if (status.bad()) {
        return 1;
    }
    DcmDataset *dataset = file_format.getDataset();

    DcmStack stack;
    DcmObject *dobject = NULL;
    DcmElement *delem = NULL;
    status = dataset->nextObject(stack, OFTrue);
    while (status.good())
    {
        dobject = stack.top();
        delem = (DcmElement *)dobject;
        DcmTagKey current_tag = delem->getTag();
        // Do something useful, check if the tag is in a list of wanted tags

        if(current_tag == DCM_PixelData)
        {
            std::cout << "Reached PixelData. End." << std::endl;
            break;
        }

        status = dataset->nextObject(stack, OFTrue);
    }
Will this be the way to do this the fastest where I can perhaps skip the private tags and not any child tags? Thanks in advance!

Best,
Jonas

J. Riesmeier
DCMTK Developer
Posts: 2504
Joined: Tue, 2011-05-03, 14:38
Location: Oldenburg, Germany
Contact:

Re: Fast parsing of a list of dicom tags and values to load into a SQL database

#2 Post by J. Riesmeier »

Iterating the top-level DICOM elements of the dataset is probably the most promising approach. Since the elements in a DICOM dataset are always ordered according to their tags (group and element number), you can stop the process after the last attribute tag in your list has been found.

By the way, instead of nextObect() you should use nextInContainer() since you are only interested in top-level elements (as far as I understood your use case).

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest