I am actually pondering about making a service or an autonomous app out of a tool I made for myself, that records the screen, keystrokes, mouse, keyboard focus location, and additionally traces the gaze if you have suitable hardware (e.g. Tobii). The goal is to be able to make some sense of all that data with the current deep learning techniques (think Copilot on steroids).
Although as a service it would be extremely expensive: video adds terabytes of storage every year, and will require even more expensive compute for deep learning. Probably a few thousand or even tens of thousands $ a year.
I have not. From the changelog, the differences between my work and theirs are:
- Mac only vs Windows only
- They have already wired some AI stuff like speech recognition (trivial with Whisper these days, I was able to use it to generate synchronous lyrics ala karaoke to my home music collection in about 1 week of coding. Unlike video does not require much compute)
- They have slick GUI and presumably reliable recording - as I did not decide to productize it yet, I only have 2 global hotkeys to start and stop.
- I capture more data: keyboard + focus, gaze traces, and mouse traces. This will allow better behavioral models (they could and probably should have an option to do it too). I especially rely on gaze, as it is a very dense data channel.
- I have functionality to replay user actions both to just view, and to actually replay them (this is where copilot-like AI will eventually be connected).
It was funny to see the codename of my project in one of their screenshots as a label on a control.
Although as a service it would be extremely expensive: video adds terabytes of storage every year, and will require even more expensive compute for deep learning. Probably a few thousand or even tens of thousands $ a year.