socket select timeout

All other questions regarding DCMTK

Moderator: Moderator Team

Post Reply
Message
Author
michael12345
Posts: 6
Joined: Tue, 2011-04-12, 13:15

socket select timeout

#1 Post by michael12345 »

We recently hit a timeout in non-blocking mode (store-scp receiving data) where the timeout set clearly hasn't been reached. After investigating the issue comes from a posix system signal interrupting the socket select blocking wait call in DcmTCPConnection::networkDataAvailable(int timeout) in dcmtrans.cc
The issue we have is that the function (networkDataAwailable) treats errors like regular timeouts, making it hard to debug the issue in the first place as it looks like the request timed out.
I think it should at least log an error in such a failing case or ideally retry the select call. Thus the code:

Code: Select all

OFBool DcmTCPConnection::networkDataAvailable(int timeout)
{
  struct timeval t;
  fd_set fdset;
  int nfound;

  FD_ZERO(&fdset);

#ifdef __MINGW32__
  /* on MinGW, FD_SET expects an unsigned first argument */
  FD_SET((unsigned int) getSocket(), &fdset);
#else
  FD_SET(getSocket(), &fdset);
#endif

  t.tv_sec = timeout;
  t.tv_usec = 0;

#ifdef HAVE_INTP_SELECT
  nfound = select(getSocket() + 1, (int *)(&fdset), NULL, NULL, &t);
#else
  nfound = select(getSocket() + 1, &fdset, NULL, NULL, &t);
#endif
  if (nfound <= 0) return OFFalse;
  else
  {
    if (FD_ISSET(getSocket(), &fdset)) return OFTrue;
    else return OFFalse;  /* This should not really happen */
  }
}
could be changed like such:

Code: Select all

OFBool DcmTCPConnection::networkDataAvailable(int timeout)
{
  OFTimer timer;
  int lTimeout = timeout;
  fd_set fdset;
  int nfound;

  FD_ZERO(&fdset);

#ifdef __MINGW32__
  /* on MinGW, FD_SET expects an unsigned first argument */
  FD_SET((unsigned int) getSocket(), &fdset);
#else
  FD_SET(getSocket(), &fdset);
#endif

  while(1) {
      // timeval can be undefined after the call to select http://man7.org/linux/man-pages/man2/select.2.html
      struct timeval t;
      t.tv_sec = lTimeout;
      t.tv_usec = 0;

#ifdef HAVE_INTP_SELECT
    nfound = select(getSocket() + 1, (int *)(&fdset), NULL, NULL, &t);
#else
    nfound = select(getSocket() + 1, &fdset, NULL, NULL, &t);
#endif
      if (nfound < 0) {

        // check for interrupt call
        if (errno == EINTR) {
            int diff = (int)timer.getDiff();
            if (diff < timeout) {
                lTimeout = timeout - diff;
                continue; // retry
            }
        }
        else {
            char buf[256];
            DCMNET_ERROR("socket select returned with error: "
                << OFStandard::strerror(errno, buf, sizeof(buf)) );
            return OFFalse;
        }
      }
      if (nfound == 0) { 
        return OFFalse; // a regular timeout
      } 
      else {
        if (FD_ISSET(getSocket(), &fdset)) return OFTrue;
        else return OFFalse;  /* This should not really happen */
      }
  }
  return OFFalse;
}
Michael

Michael Onken
DCMTK Developer
Posts: 2048
Joined: Fri, 2004-11-05, 13:47
Location: Oldenburg, Germany
Contact:

Re: socket select timeout

#2 Post by Michael Onken »

Hi Michael,

thanks a lot for debugging this.

The patch looks reasonable to me. Without trying the code yet, I wonder whether the timeout must be adapted in the loop, i.e. since "t.tv_sec = lTimeout" always resets it to the original value. So if there are many interrupts, the timeout occurs in lTimeout+interruptsTime seconds and not after lTimeout seconds. Or do I miss something?

Thanks!
Michael

michael12345
Posts: 6
Joined: Tue, 2011-04-12, 13:15

Re: socket select timeout

#3 Post by michael12345 »

I think it is correct as it is. At the start of the function lTimeout is set to the initial timeout. In case of a retry (failure) lTimeout is reduced with the time already spent (tracked using OFTimer):
lTimeout = timeout - diff; // where timeout is the original timeout and diff the result from OFTimer::getDiff()
Then the struct t.tv_sec gets reinitialized with the modified lTimeout.

Regards,

Ps: I did update this part in the post some time ago, maybe you are looking at an older 'commit'?

Michael

Michael Onken
DCMTK Developer
Posts: 2048
Joined: Fri, 2004-11-05, 13:47
Location: Oldenburg, Germany
Contact:

Re: socket select timeout

#4 Post by Michael Onken »

Hi Michael,

thanks for the clarification, you are right! Just overlooked this (I was mainly looking at the diff).

I add this to our bug tracker, and it should go into the next release (don't ask... ;).

Best,
Michael

michael12345
Posts: 6
Joined: Tue, 2011-04-12, 13:15

Re: socket select timeout

#5 Post by michael12345 »

Thanks, will be patient I promise ;-)

michael12345
Posts: 6
Joined: Tue, 2011-04-12, 13:15

Re: socket select timeout

#6 Post by michael12345 »

It seems that this issue was forgotten, can't see the changes in the code nor find it in the bug tracker. Can you follow up on this please?

Michael Onken
DCMTK Developer
Posts: 2048
Joined: Fri, 2004-11-05, 13:47
Location: Oldenburg, Germany
Contact:

Re: socket select timeout

#7 Post by Michael Onken »

Hi,

you are right, this somehow slipped through (including bug entry). I'll get back about the issue before end of the week and let you know; don't hesitate pushing me again in case I don't ;)

Best,
Michael

Michael Onken
DCMTK Developer
Posts: 2048
Joined: Fri, 2004-11-05, 13:47
Location: Oldenburg, Germany
Contact:

Re: socket select timeout

#8 Post by Michael Onken »

Hi,

one more question about the issue since I am not that experienced with POSIX signals: Does it make sense to actually ignore any interruption signals? If the signal was about to stop the process, probably returning from networkDataAvailable() makes sense. So is it meaningful to react differently based on the type of interruption signal?

Thanks,
Michael

michael12345
Posts: 6
Joined: Tue, 2011-04-12, 13:15

Re: socket select timeout

#9 Post by michael12345 »

Hello Michael,

that's a good question and difficult for me to answer as my knowledge about this is also limited. From my understanding handling EINTR depends a bit on the use case which makes it difficult to implement "the right way" in a library that can be used in lots of different scenarii.
It's indeed problematic to completely ignore all interruptions as in the code sample given, it misses the bail out logic in case of signals that should be handled (SIGINT, SIGQUIT etc.) as it could keep a thread from closing etc.
Usually this needs extra treatment like a signal handler raising an exception or setting a global flag. A good description can be found here: http://legacy.python.org/dev/peps/pep-0475/
We finally found out what caused the issue in the first place: a third party library set a signal handler listening for SIGCHLD that randomly let the socket select to interrupt, however instead of modifying the dcmtk source code we probably could have just blocked the signal with a sigmask or disabled the signal handler.
So in the end it's probably best to leave the code as is and rather let the application developer handle it. Thanks for coming back to this.

Michael

Michael Onken
DCMTK Developer
Posts: 2048
Joined: Fri, 2004-11-05, 13:47
Location: Oldenburg, Germany
Contact:

Re: socket select timeout

#10 Post by Michael Onken »

Hi Michael,

thanks for looking deeper into the issue.

I will then keep the code as is. Maybe I add a dedicated error message if an interrupt has occurred, so its easier to what actually happened.

Best,
Michael

Michael Onken
DCMTK Developer
Posts: 2048
Joined: Fri, 2004-11-05, 13:47
Location: Oldenburg, Germany
Contact:

Re: socket select timeout

#11 Post by Michael Onken »

For the record: I added extra logging for the result of the select() call.

This commit #e78776f and commit #fbd39e0 .

Best,
Michael

Marco Eichelberg
OFFIS DICOM Team
OFFIS DICOM Team
Posts: 1437
Joined: Tue, 2004-11-02, 17:22
Location: Oldenburg, Germany
Contact:

Re: socket select timeout

#12 Post by Marco Eichelberg »

Your patch (with some minor modifications) has now been committed as commit 4185380, which should appear in the public git repository in a couple of days.

Post Reply

Who is online

Users browsing this forum: Ahrefs [Bot], Google [Bot] and 1 guest