I guess I'm not understanding your use case, if it's not periodic invocation. Wh...

gambiting · on Oct 5, 2020

Because once a thread is done with all available work, it has to sleep/wait, the only other alternative is busy waiting(awful idea). So if you see ok, there is no more available work, and call sleep(1), and some more work comes in 2ms later, that work might not be picked up for a whole next 14ms. That is the core of the problem.

Like, you are focusing on the sleep(1) part too much - what I want to do is "wait until there is more work available to process, and then wake up and start processing as soon as physically possible". On Windows the "as soon as physically possible" part is unfortunately 16ms in most cases, and that's just not acceptable for something as precise as audio work.

dataflow · on Oct 5, 2020

> what I want to do is "wait until there is more work available to process"

If this is merely to free up CPU when you have no work to do, then the actual timing of 1ms vs 16ms doesn't seem like the issue, right? Shouldn't you be instead waiting on a condition that gets set when work is queued? Like an event object, or an I/O completion port, or something that indicates work is ready? Instead of polling for the condition?

> On Windows the "as soon as physically possible" part is unfortunately 16ms in most cases, and that's just not acceptable for something as precise as audio work.

Aren't you contradicting yourself here? If you're doing audio don't you want periodic multimedia timers? Which are built for this purpose and actually also do deliver the resolution you need?

gambiting · on Oct 5, 2020

>>Shouldn't you be instead waiting on a condition that gets set when work is queued? Like an event object, or an I/O completion port, or something that indicates work is ready? Instead of polling for the condition?

Again, as I said multiple times - the WaitForObject won't wake up any sooner than Sleep(1) will. Only a real proper hardware interrupt could wake up quicker - but that requires special hardware that just wasn't available in that scenario. If it's easier to think about that way, then please assume I said that WaitForObject "sleeps" way more than necessary due to the low timer resolution on Windows.

>>Aren't you contradicting yourself here? If you're doing audio don't you want periodic multimedia timers?

No, because like I said at the very beginning, I was writing a server application processing audio coming in from the clients. So essentially every time packets came in from client machines they had to be processed and sent back - that's why I had multiple threads "waiting" for work and they had to pick it up as soon as possible, with no guarantee how often that work was going to appear. It's not the same as picking up audio from a hardware device on a client machine which yes, has to be done periodically. It's a simple producer/consumer queue problem, but with it being audio data, 16ms wait times to process packets was not acceptable - pretty much no other work is that time sensitive(except for video I guess), yet it wasn't the kind of work that has to be done on a specific periodic timer.

stagger87 · on Oct 5, 2020

I use CreateEvent/SetEvent/WaitForSingleObject in almost the same exact way you describe/desire. Producer/consumer type pattern, all in software. I have one or more threads waiting to process data, and they use the WaitForSingleObject function to wait for the producer to set the event. I have tested this and found I can effectively send > 100k events/"wakeups" per second through this mechanism. I tested this by having two threads sequentially wake each other up in ping-pong fashion. I use this to process data coming in off SDRs, usually at rates > 30MS/s. This code runs on 10's of thousands of PCs, I would have heard about issues by now if these worked at 1ms or greater resolution.

jlokier · on Oct 5, 2020

> the WaitForObject won't wake up any sooner than Sleep(1) will. Only a real proper hardware interrupt could wake up quicker - but that requires special hardware that just wasn't available in that scenario. If it's easier to think about that way, then please assume I said that WaitForObject "sleeps" way more than necessary due to the low timer resolution on Windows.

Above is not correct.

SetEvent/WaitForSingleObject takes a few microseconds to wake the thread. Not milliseconds, microseconds.

WaitForSingleObject doesn't use a timer if you set infinite timeout.

It does not "sleep more than necessary due to the low timer resolution", because it does not use the timer.

Even if you set a timeout, it doesn't use the timer when the wake is triggered by SetEvent. The wake is immediate.

Wake triggered by SetEvent in a different thread doesn't use interrupts or need special hardware. It's purely kernel scheduling inside Windows.

For a real-world measurement, see for example:

>> "Windows event object was used but it takes 6-7us from SetEvent() to WaitForSingleObject() return on my machine

The person is annoyed that it takes 6-7 microseconds, and they want to get it lower.

Here's a Microsoft example of how to have one thread wake another:

https://docs.microsoft.com/en-us/windows/win32/sync/using-ev...

There are other ways for one thread to quickly wake another thread on Windows, but SetEvent/WaitForSingleObject is the easiest in a general purpose thread.

What I suspect happened in tests where it appeared that "WaitForObject "sleeps" way more than necessary" is that there was a race condition in the event trigger/wakeup logic between the two threads, causing the worker thread to depend on timeout behaviour of WaitForSingleObject instead of being woken reliably by SetEvent. That is is actually a very common bug in this kind of wakeup logic, and you have to understand race conditions to solve it. When the race condition is solved, SetEvent wakeups become consistently fast.

dataflow · on Oct 5, 2020

> with no guarantee how often that work was going to appear.

But you don't need (and should neither need nor want!) such a guarantee to do this the way I'm saying! It sounds like you might just be unfamiliar with overlapped I/O? ReadFile(), WSARecvMsg(), etc. all take OVERLAPPED structures that let you pass a HANDLE to an event object that gets signaled. There's also RegisterWaitForSingleObject, WSAAsyncSelect, WSAEventSelect, CreateIoCompletionPort... you name it. Heck, if you just have a thread select() the old-fashioned way, I think it should still wake up when the data comes, without introducing a delay at all. Nowhere should it matter how fast the data is coming, or to force you to wait more than you need to. Am I missing something?

vardump · on Oct 5, 2020

> Am I missing something?

I think what you're missing is that all of the overlapped I/O scenarios completions you describe all originate from a hardware interrupt from some device, like a network adapter.

We were talking about timing, not about waiting for I/O completion.

dataflow · on Oct 5, 2020

> I think what you're missing is that all of the overlapped I/O scenarios completions you describe all originate from a hardware interrupt from some device, like a network adapter.

The comment literally says "packets came in from client machines". And they wanted to service these immediately, not 16ms later. Which is exactly what I'm describing.

Timing requirements don't arise out of the void. Either they're externally mandated based on the wall clock (which is generally for human-oriented things like multimedia) in which case you should probably use something like a periodic multimedia timer, or they're based on other internal events (I/O, user input, etc.) that you can act on immediately. In neither case does Sleep() seem like the right solution, regardless of its accuracy...

Very few real-world exceptions exist to this. In fact the only example I can think of off the top of my head (barring ones whose solution is busy-looping) is "I need to write a debugger that waits for a memory location to change and lets me know ASAP", in which case, I guess polling memory might be the way to go? But it would seem you would want event-based actions...

gambiting · on Oct 5, 2020

Well, (and I don't mean it sarcastically) clearly you know more about this that me or any of our engineers do, because we couldn't find a way to do this. Yes, I knew that packets coming in trigger a hardware interrupt that wakes up the thread processing them - but as we had multiple other threads actually decoding this data, there isn't a good way to wake those up from the receiving thread. The sleep(1) and check for work was the best solution, as every other method/API we tried had the same problem of ultimately being limited by the system timer interrupt, and we couldn't get around it in any way.

I don't want to say that there definitely isn't a way to do this - but we never found it.

dataflow · on Oct 5, 2020

I see. It's hard to say if I'm missing something but given what I've heard so far I think what you want is I/O completion ports (though there are other ways to solve this). They're basically producer-consumer queues. I highly recommend looking into them if you do I/O in Windows. They're more complicated but they work very well. Here's a socket example: https://www.winsocketdotnetworkprogramming.com/winsock2progr...

vardump · on Oct 5, 2020

I've written Windows device drivers, so I think I could say I'm rather intimately familiar with Windows I/O request (IRP) processing.

Sounds like people in gambiting's company are the same.

I/O completion ports is an API for getting better efficiency between kernel and userland I/O. Batching, better scheduling and avoiding WFMO bottleneck. It's great, but it doesn't really have anything to do with timers.

The bottom line here is that Windows timer behavior has changed. This is terrifying.

dataflow · on Oct 5, 2020

It "has nothing to do with timing" in the sense that it doesn't let you choose how much delay to incur. It most certainly does "have something to do with timing" in that it lets you do I/O immediately, without polling or incurring system timer interrupt delays... which is precisely what his description of the problem required. It's kind of bizarre to see you trying to interject into it to argue with me as if you know his situation better than himself while even contradicting his descriptions in the process. I feel like the discussion already concluded on that topic. I'll leave this as my last comment here.

vardump · on Oct 5, 2020

> without polling or incurring system timer interrupt delays... which is precisely what his description of the problem required

Except sometimes polling is your only option. Your scenario works great, if the hardware supports that use case and you're not using some intermediate software layer that prevents you from taking full advantage of the hardware.

Other scenario is when Windows is being used to control something time critical. I fully agree one shouldn't do that, but sometimes you just have to. The new timer change really hurts here.

In other words, we don't live in the perfect world, but still need to get the software working.

I also feel like we're somewhat talking past each other.

> It's kind of bizarre to see you trying to interject into it to argue with me as if you know his situation better than himself while even contradicting his descriptions in the process.

I do know highly similar cases I've had to deal with. Not audio processing, but similarly tight deadlines. It was very painful to get it working correctly.

vardump · on Oct 5, 2020

Unfortunately, sometimes timing requirements do arise "out of the void" to adhere, because not all hardware is perfect.

Also sometimes you just need to do timing sensitive processing. I guess you could argue one shouldn't use Windows for that, but unfortunately sometimes we developers don't really have a choice. Doing what you're told pays the bills. :-)

dataflow · on Oct 5, 2020

I think you misunderstood. If you want "timing sensitive processing", that's what you can and should expect to use events or multimedia timers for. That way you can actually expect to meet timing guarantees. Using Sleep() and depending on it for correctness in such situations (regardless of its accuracy) is kind of like using your spacebar to heat your laptop... it doesn't seem like the right way to go on its face, even if there was something to guarantee it would work: https://xkcd.com/1172/

vardump · on Oct 5, 2020

> that's what you can and should expect to use events or multimedia timers for

I don't want to use a deprecated API call, timeSetEvent.

> Using Sleep() and depending on it...

Absolutely no one was using Sleep for timing. It just used to be that all Windows timer mechanisms are really the same thing, and Sleep(1) was an easy way to quickly test actual timing precision.

Obviously Windows 10/2004 changed all this.

jlokier · on Oct 5, 2020

> I don't want to use a deprecated API call, timeSetEvent.

That's not what dataflow is saying to use. The phrase is "events or timers", not "timer events".

gambiting describes this scenario:

The hardware (network) does interrupt the CPU and supplies an incoming packet to the receiver thread, which does not use a timer.

Then the receiver thread wants to send it to a worker thread.

It sounds like gambiting's implementation has the worker threads polling a software queue using a sleep-based timer, to pick up work from the receiver thread.

Unless there's something not being said in that picture, the receiver should be sending an event to a worker thread to wake the worker instantaneously, rather than writing to memory and letting the workers get around to polling it in their own sweet, timer-dependent, unnecessarily slow time.

Given that it's described as audio data which must be processed with low latency, it seems strange to insert an avoidable extra 1ms delay into the processing time...

This is kind of the point of events (and in windows, messages; not timers or timer-events). So that one thread can wake another thread immediately.

The whole "if your hardware supports it" reply seems irrelevant to the scenario gambiting has described, as the only relevant hardware sounds like network hardware. Either that's interrupting on packet arrival or it isn't, but either way, a sleep(1) loop in the worker threads won't make packet handling faster, and is almost certainly making the responses slower than they need to be, no matter which version of Windows and which timer rate.(+)

(+) (Except in some exotic multi-core setups, and it doesn't sound like that applies here).

gambiting · on Oct 5, 2020

>>Unless there's something not being said in that picture, the receiver should be sending an event to a worker thread to wake the worker instantaneously

I have kind of answered this before - you can signal on an event from the receiver thread, and it doesn't wake up the waiting threads instantaneously, even if they are waiting on a kernel-level object. The WaitForObject will only return at the next interrupt of a system timer.....so exactly at the same point when a Sleep(1) would have woken up too. There is no benefit to using the event architecture, because for those sub-16ms events it doesn't actually bypass the constraint of the system timer interrupt.

>>So that one thread can wake another thread immediately.

The problem here is that the immediate part isn't actually immediate, that's the crux of the whole issue.

dataflow · on Oct 5, 2020

> you can signal on an event from the receiver thread, and it doesn't wake up the waiting threads instantaneously, even if they are waiting on a kernel-level object.

This isn't normally the case. You should expect this not to be the case because programs would be insanely inefficient if this happened. My suspicion is there was something else going on. For example, maybe all CPUs were busy running threads (possibly with higher priority?) and so there was no CPU a waiting worker could be scheduled on. But it's not normally what's supposed to happen; it's pretty easy to demonstrate threads get notified practically immediately and don't wait for a time slice. Just run this example and you'll see threads getting notified in a few microseconds:

  #include <process.h>
  #include <tchar.h>
  #include <Windows.h>
  
  LARGE_INTEGER prev_time;
  
  unsigned int CALLBACK worker(void *handle)
  {
   LARGE_INTEGER pc, pf;
   QueryPerformanceFrequency(&pf);
   WaitForSingleObject(handle, 5000);
   QueryPerformanceCounter(&pc);
   _tprintf(_T("%lu us\n"), (pc.QuadPart - prev_time.QuadPart) * 1000000LL / pf.QuadPart);
   return 0;
  }
  
  int _tmain(int argc, TCHAR *argv[])
  {
   HANDLE handle = CreateEvent(NULL, FALSE, FALSE, NULL);
   uintptr_t thd = _beginthreadex(NULL, 0, worker, handle, 0, NULL);
   Sleep(100);
   QueryPerformanceCounter(&prev_time);
   SetEvent(handle);
   WaitForSingleObject((HANDLE)thd, INFINITE);
  }

jlokier · on Oct 6, 2020

From my other comment here, I think you have a race condition or something like that in your code, or as the peer comment suggests scheduling contention on the CPU, which is causing this effect in your test.

For a real-world measurement, see:

>> "Windows event object was used but it takes 6-7us from SetEvent() to WaitForSingleObject() return on my machine

The person measured 6-7 microseconds.

The system timer is not interrupting that fast.

vardump · on Oct 7, 2020

SetEvent/WaitForSingleObject is a thread yield from SetEvent thread to the waiting thread. Scheduler does that when SetEvent is called. So microsecond level waiting time is expected.

You don't need to wait for timer tick.

IOW, parent was talking about timer wait, not event wait. Timer events are only processed at timer interrupt ticks.

jlokier · on Oct 7, 2020

> IOW, parent was talking about timer wait, not event wait. Timer events are only processed at timer interrupt ticks.

No they weren't. Read parent again, my emphasis added:

>> you can signal on an event from the receiver thread, and it doesn't wake up the waiting threads instantaneously, even if they are waiting on a kernel-level object.

>> The WaitForObject will only return at the next interrupt of a system timer.....so exactly at the same point when a Sleep(1) would have woken up too. There is no benefit to using the event architecture, because for those sub-16ms events it doesn't actually bypass the constraint of the system timer interrupt.

They described their threads in another comment. There's a receiver thread which receives network packets, and sends them internally to processing threads. The processing threads use Sleep(1) in a polling loop, because they believe the receiving thread cannot send an event to wake a processing thread faster than Sleep(1) waits anyway, and they believe the system would need special interrupt hardware to make that inter-thread wakeup faster.

IOW, they are using a timer polling loop only because they found non-timer events take as long to wake the target thread as having the target thread poll every 16ms using Sleep(1).

But that's not how Windows behaves, if the application is coded correctly and the system is not overloaded, (and assuming there isn't a Windows bug).

My hunch is a race condition in the use of SendEvent/WaitForMultipleObjects in the test which produced that picture, because that's an expected effect if there is one. But it could also be CPU scheduler contention, which would not usually make the wake happen at the next timer tick, but would align it to some future timer tick in future, so the Sleep(1) version would show no apparent disadvantage. If it's CPU contention, there's probably significant latency variance (jitter) and therefore higher worst-case latency due to pre-emption.

vardump · on Oct 5, 2020

Again. Sleep(1) is just a shorthand to discuss about this topic. Exact same behavior affects every other Windows timing mechanism as well.