> I think what you're missing is that all of the overlapped I/O scenarios comple...

gambiting · on Oct 5, 2020

Well, (and I don't mean it sarcastically) clearly you know more about this that me or any of our engineers do, because we couldn't find a way to do this. Yes, I knew that packets coming in trigger a hardware interrupt that wakes up the thread processing them - but as we had multiple other threads actually decoding this data, there isn't a good way to wake those up from the receiving thread. The sleep(1) and check for work was the best solution, as every other method/API we tried had the same problem of ultimately being limited by the system timer interrupt, and we couldn't get around it in any way.

I don't want to say that there definitely isn't a way to do this - but we never found it.

dataflow · on Oct 5, 2020

I see. It's hard to say if I'm missing something but given what I've heard so far I think what you want is I/O completion ports (though there are other ways to solve this). They're basically producer-consumer queues. I highly recommend looking into them if you do I/O in Windows. They're more complicated but they work very well. Here's a socket example: https://www.winsocketdotnetworkprogramming.com/winsock2progr...

vardump · on Oct 5, 2020

I've written Windows device drivers, so I think I could say I'm rather intimately familiar with Windows I/O request (IRP) processing.

Sounds like people in gambiting's company are the same.

I/O completion ports is an API for getting better efficiency between kernel and userland I/O. Batching, better scheduling and avoiding WFMO bottleneck. It's great, but it doesn't really have anything to do with timers.

The bottom line here is that Windows timer behavior has changed. This is terrifying.

dataflow · on Oct 5, 2020

It "has nothing to do with timing" in the sense that it doesn't let you choose how much delay to incur. It most certainly does "have something to do with timing" in that it lets you do I/O immediately, without polling or incurring system timer interrupt delays... which is precisely what his description of the problem required. It's kind of bizarre to see you trying to interject into it to argue with me as if you know his situation better than himself while even contradicting his descriptions in the process. I feel like the discussion already concluded on that topic. I'll leave this as my last comment here.

vardump · on Oct 5, 2020

> without polling or incurring system timer interrupt delays... which is precisely what his description of the problem required

Except sometimes polling is your only option. Your scenario works great, if the hardware supports that use case and you're not using some intermediate software layer that prevents you from taking full advantage of the hardware.

Other scenario is when Windows is being used to control something time critical. I fully agree one shouldn't do that, but sometimes you just have to. The new timer change really hurts here.

In other words, we don't live in the perfect world, but still need to get the software working.

I also feel like we're somewhat talking past each other.

> It's kind of bizarre to see you trying to interject into it to argue with me as if you know his situation better than himself while even contradicting his descriptions in the process.

I do know highly similar cases I've had to deal with. Not audio processing, but similarly tight deadlines. It was very painful to get it working correctly.

vardump · on Oct 5, 2020

Unfortunately, sometimes timing requirements do arise "out of the void" to adhere, because not all hardware is perfect.

Also sometimes you just need to do timing sensitive processing. I guess you could argue one shouldn't use Windows for that, but unfortunately sometimes we developers don't really have a choice. Doing what you're told pays the bills. :-)

dataflow · on Oct 5, 2020

I think you misunderstood. If you want "timing sensitive processing", that's what you can and should expect to use events or multimedia timers for. That way you can actually expect to meet timing guarantees. Using Sleep() and depending on it for correctness in such situations (regardless of its accuracy) is kind of like using your spacebar to heat your laptop... it doesn't seem like the right way to go on its face, even if there was something to guarantee it would work: https://xkcd.com/1172/

vardump · on Oct 5, 2020

> that's what you can and should expect to use events or multimedia timers for

I don't want to use a deprecated API call, timeSetEvent.

> Using Sleep() and depending on it...

Absolutely no one was using Sleep for timing. It just used to be that all Windows timer mechanisms are really the same thing, and Sleep(1) was an easy way to quickly test actual timing precision.

Obviously Windows 10/2004 changed all this.

jlokier · on Oct 5, 2020

> I don't want to use a deprecated API call, timeSetEvent.

That's not what dataflow is saying to use. The phrase is "events or timers", not "timer events".

gambiting describes this scenario:

The hardware (network) does interrupt the CPU and supplies an incoming packet to the receiver thread, which does not use a timer.

Then the receiver thread wants to send it to a worker thread.

It sounds like gambiting's implementation has the worker threads polling a software queue using a sleep-based timer, to pick up work from the receiver thread.

Unless there's something not being said in that picture, the receiver should be sending an event to a worker thread to wake the worker instantaneously, rather than writing to memory and letting the workers get around to polling it in their own sweet, timer-dependent, unnecessarily slow time.

Given that it's described as audio data which must be processed with low latency, it seems strange to insert an avoidable extra 1ms delay into the processing time...

This is kind of the point of events (and in windows, messages; not timers or timer-events). So that one thread can wake another thread immediately.

The whole "if your hardware supports it" reply seems irrelevant to the scenario gambiting has described, as the only relevant hardware sounds like network hardware. Either that's interrupting on packet arrival or it isn't, but either way, a sleep(1) loop in the worker threads won't make packet handling faster, and is almost certainly making the responses slower than they need to be, no matter which version of Windows and which timer rate.(+)

(+) (Except in some exotic multi-core setups, and it doesn't sound like that applies here).

gambiting · on Oct 5, 2020

>>Unless there's something not being said in that picture, the receiver should be sending an event to a worker thread to wake the worker instantaneously

I have kind of answered this before - you can signal on an event from the receiver thread, and it doesn't wake up the waiting threads instantaneously, even if they are waiting on a kernel-level object. The WaitForObject will only return at the next interrupt of a system timer.....so exactly at the same point when a Sleep(1) would have woken up too. There is no benefit to using the event architecture, because for those sub-16ms events it doesn't actually bypass the constraint of the system timer interrupt.

>>So that one thread can wake another thread immediately.

The problem here is that the immediate part isn't actually immediate, that's the crux of the whole issue.

dataflow · on Oct 5, 2020

> you can signal on an event from the receiver thread, and it doesn't wake up the waiting threads instantaneously, even if they are waiting on a kernel-level object.

This isn't normally the case. You should expect this not to be the case because programs would be insanely inefficient if this happened. My suspicion is there was something else going on. For example, maybe all CPUs were busy running threads (possibly with higher priority?) and so there was no CPU a waiting worker could be scheduled on. But it's not normally what's supposed to happen; it's pretty easy to demonstrate threads get notified practically immediately and don't wait for a time slice. Just run this example and you'll see threads getting notified in a few microseconds:

  #include <process.h>
  #include <tchar.h>
  #include <Windows.h>
  
  LARGE_INTEGER prev_time;
  
  unsigned int CALLBACK worker(void *handle)
  {
   LARGE_INTEGER pc, pf;
   QueryPerformanceFrequency(&pf);
   WaitForSingleObject(handle, 5000);
   QueryPerformanceCounter(&pc);
   _tprintf(_T("%lu us\n"), (pc.QuadPart - prev_time.QuadPart) * 1000000LL / pf.QuadPart);
   return 0;
  }
  
  int _tmain(int argc, TCHAR *argv[])
  {
   HANDLE handle = CreateEvent(NULL, FALSE, FALSE, NULL);
   uintptr_t thd = _beginthreadex(NULL, 0, worker, handle, 0, NULL);
   Sleep(100);
   QueryPerformanceCounter(&prev_time);
   SetEvent(handle);
   WaitForSingleObject((HANDLE)thd, INFINITE);
  }

jlokier · on Oct 6, 2020

From my other comment here, I think you have a race condition or something like that in your code, or as the peer comment suggests scheduling contention on the CPU, which is causing this effect in your test.

For a real-world measurement, see:

>> "Windows event object was used but it takes 6-7us from SetEvent() to WaitForSingleObject() return on my machine

The person measured 6-7 microseconds.

The system timer is not interrupting that fast.

vardump · on Oct 7, 2020

SetEvent/WaitForSingleObject is a thread yield from SetEvent thread to the waiting thread. Scheduler does that when SetEvent is called. So microsecond level waiting time is expected.

You don't need to wait for timer tick.

IOW, parent was talking about timer wait, not event wait. Timer events are only processed at timer interrupt ticks.

jlokier · on Oct 7, 2020

> IOW, parent was talking about timer wait, not event wait. Timer events are only processed at timer interrupt ticks.

No they weren't. Read parent again, my emphasis added:

>> you can signal on an event from the receiver thread, and it doesn't wake up the waiting threads instantaneously, even if they are waiting on a kernel-level object.

>> The WaitForObject will only return at the next interrupt of a system timer.....so exactly at the same point when a Sleep(1) would have woken up too. There is no benefit to using the event architecture, because for those sub-16ms events it doesn't actually bypass the constraint of the system timer interrupt.

They described their threads in another comment. There's a receiver thread which receives network packets, and sends them internally to processing threads. The processing threads use Sleep(1) in a polling loop, because they believe the receiving thread cannot send an event to wake a processing thread faster than Sleep(1) waits anyway, and they believe the system would need special interrupt hardware to make that inter-thread wakeup faster.

IOW, they are using a timer polling loop only because they found non-timer events take as long to wake the target thread as having the target thread poll every 16ms using Sleep(1).

But that's not how Windows behaves, if the application is coded correctly and the system is not overloaded, (and assuming there isn't a Windows bug).

My hunch is a race condition in the use of SendEvent/WaitForMultipleObjects in the test which produced that picture, because that's an expected effect if there is one. But it could also be CPU scheduler contention, which would not usually make the wake happen at the next timer tick, but would align it to some future timer tick in future, so the Sleep(1) version would show no apparent disadvantage. If it's CPU contention, there's probably significant latency variance (jitter) and therefore higher worst-case latency due to pre-emption.