Stuck storescu's

All other questions regarding DCMTK

Moderator: Moderator Team

Post Reply
Message
Author
razvanux
Posts: 28
Joined: Thu, 2005-02-10, 08:04
Location: Cluj-Napoca, Romania
Contact:

Stuck storescu's

#1 Post by razvanux »

Hello all...

I am using OFFIS in my dicom sender application (based on storescu) and I am experiencing a weird problem while trying
to send a bunch of images to a destination PACS system. While I can set a sender to send images
overnight to the destination, after sending several thousand images, it will simply lock-up
on an Association Request phase, in a call to winsock's recv function, after sending about 6-7 thousand files.
The images are all MRs, with the association being established once and reused for each image.

This keeps happening every time and only after a really, really long time since the sender was started.

Can anybody give me ANY feedback on this?

Thank you.

razvanux
Posts: 28
Joined: Thu, 2005-02-10, 08:04
Location: Cluj-Napoca, Romania
Contact:

#2 Post by razvanux »

Hello all...

Further investigation of this issue led to the point of the actual problem: DcmTCPConnection::read.

While the actual cause of the problem could not be determined, the issue was addressed by inserting a data-availability time-out loop just before calling the winsock's recv() function.

The updated code looks like this:

Code: Select all

ssize_t DcmTCPConnection::read(void *buf, size_t nbyte)
{
#ifdef HAVE_WINSOCK_H

  unsigned long readavail = 0;

  int		to = 	60;	// 60 seconds timeout for read availability

  int		_sttm = time(NULL);

  int		_entm = _sttm;

  OFBool	timedout = OFFalse;

  ioctlsocket(getSocket(), FIONREAD, &readavail);

  while (readavail == 0 && !timedout )
  {

	  Sleep(100);

	  _entm = time(NULL);

	  timedout = _entm - _sttm > to;

	  ioctlsocket(getSocket(), FIONREAD, &readavail);
  }
  
  if (timedout)
  {
	  return 0;
  }
  ssize_t ret = recv(getSocket(), (char *)buf, nbyte, 0);
  return ret;
  
#else
  return ::read(getSocket(), (char *)buf, nbyte);
#endif
}

NOTE: Of course, the locally set timeout variable should really be a global variable.

No other timeouts I could find in OFFIS did the trick and this was the only way I could solve the problem. This is a success for several week now of intensive activity (a data-migration project).

I wonder if this is something that is already addressed in the future release of OFFIS.

There is one other thing I am wondering:
the DcmTCPConnection::networkDataAvailable
method is using select() to determine the availability of... data?

As far as I understand, select() will determine the availability of a network resource as opposed to finding whether there is data in a certain sockets' buffers.

Could anybody clarify this?

Thank you and best regards,
Razvan

Thomas Wilkens
DCMTK Developer
Posts: 117
Joined: Tue, 2004-11-02, 17:21
Location: Oldenburg, Germany
Contact:

#3 Post by Thomas Wilkens »

Have you considered using for example valgrind (on Linux) to find out if your code has some kind of memory problem? In other words, you might do some memory allocation and you maybe have forgotten to free this memory again after it is used. After running for a long time, the machine might simply run "out of memory" because you have allocated all of your machine's memory without freeing anything. That would explain why this is happening "after a really, really long time since the sender was started".

It would be very interesting if you could check this. If it turns out that such a problem is contained in DCMTK itself, we would be happy to know about it and correct the bug.

razvanux
Posts: 28
Joined: Thu, 2005-02-10, 08:04
Location: Cluj-Napoca, Romania
Contact:

#4 Post by razvanux »

Thank you for your reply Thomas.

No, there is no memory leak that is causing this and I am pretty certain about that.
I do have memory leak detectors installed and a simple additional reasoning supproting this would be that if it were for such a leak, the workaround I've imlpemented would simply not work.
Also, the machine does not hang, nor the senders themselves. They are just stuck in the recv call, waiting for data to be sent to them.

We're speculating on this that there are either winsock problems or network problems that lead to this. Doing a check for data availability looked like the reasonable thing to do and the workaround proved safe for two weeks now - that is, sending a several hundred thousand images since - without rebooting the machine ;).

Razvan

Thomas Wilkens
DCMTK Developer
Posts: 117
Joined: Tue, 2004-11-02, 17:21
Location: Oldenburg, Germany
Contact:

#5 Post by Thomas Wilkens »

Ok, I just wanted to make sure there is no memory leak in DCMTK's storescu. Thank you for your answer.

Marco Eichelberg
OFFIS DICOM Team
OFFIS DICOM Team
Posts: 1446
Joined: Tue, 2004-11-02, 17:22
Location: Oldenburg, Germany
Contact:

#6 Post by Marco Eichelberg »

There is one other thing I am wondering: the DcmTCPConnection::networkDataAvailable method is using select() to determine the availability of... data?
As far as I understand, select() will determine the availability of a network resource as opposed to finding whether there is data in a certain sockets' buffers.
It is correct that DcmTCPConnection::networkDataAvailable is based on select(). To cite from the select(2) manual page:
Three independent sets of descriptors are watched. Those listed in readfds will be watched to see if characters become available for reading (more precisely, to see if a read will not block - in particular, a file descriptor is also ready on end-of-file), those in writefds will be watched to see if a write will not block, and those in exceptfds will be watched for exceptions. On exit, the sets are modified in place to indicate which descriptors actually changed status.
This seems to be exactly what we need in this function, doesn't it?

razvanux
Posts: 28
Joined: Thu, 2005-02-10, 08:04
Location: Cluj-Napoca, Romania
Contact:

#7 Post by razvanux »

Thank you for pointing out the obvious Marco. It looks then that something else is really wrong here...

Do I understand correctly that you have never observed such behavior?

...

networkDataAvailable IS called before doing the recv in defragmentTCP, if DUL_NOBLOCK is specified. A quick look shows that blocking I/O is performed on sockets on critical points in the DUL FSM (request assoc, ack assoc).

Hmmm... Could this be IT?
This means that the SCP is actually deciding for some reason not to respond anymore (power failure, low disk space, whatever) during an association negotiation and offis gets stuck in a recv call, never receiving a respoinse from the SCP.

Would you confirm that I am right in this assumption?

Thank you for your patience.
P.S. This is a windows offis based client.

Marco Eichelberg
OFFIS DICOM Team
OFFIS DICOM Team
Posts: 1446
Joined: Tue, 2004-11-02, 17:22
Location: Oldenburg, Germany
Contact:

#8 Post by Marco Eichelberg »

We have never observed such behaviour indeed - and to my knowledge nobody has reported this type of behaviour either by e-mail or in this forum before. That of course does not mean that there is nothing wrong with our code, but at least it seems to be a rather rare problem.
For your information, the DUL FSM always operates the socket in blocking mode. In mode DUL_NOBLOCK, additional checks are performed inside the DUL code to make sure that read(), accept() etc. are only called when they should return without blocking. This may not be the best implementation choice, but that's hard-coded very deeply inside the DUL code and has been so since ca. 1992.
Because of this, the toolkit may get stuck in certain very unlikely situations - e.g. if the SCP sends an incomplete association response PDU, so the DUL code "sees" data on the socket and tries to read the PDU, but at some point a call to DcmTCPConnection::read() is issued but the socket has no more data to deliver - then the call would block indeed, which may be the problem you experience here and would also explain why your fix works. A more portable way of fixing the problem could be to use select() instead of the ioctlsocket() call, but the result should be similar.

Post Reply

Who is online

Users browsing this forum: Ahrefs [Bot], Google [Bot] and 1 guest