Hacker Newsnew | past | comments | ask | show | jobs | submit | troysk's commentslogin

I do it using MMWave sensor, 60Ghz one. Want to have more of them but installation is a pain as these need to be mounted on ceiling so WiFi based sensor would be awesome!


Do you have a writeup about this somewhere? I'd love to know more.


"Low cost mmWave 60GHz radar sensor for advanced sensing" (2025), 50 comments, https://news.ycombinator.com/item?id=44665982

"Inside a $1 radar motion sensor" (2024), 100 comments, https://news.ycombinator.com/item?id=40834349

"mmWave radar, you won't see it coming" (2022), 180 comments, https://news.ycombinator.com/item?id=30172647

"What Is mmWave Radar?: Everything You Need to Know About FMCW" (2022), 30 comments, https://news.ycombinator.com/item?id=35312351


Unfortunately no. But fortunately, it isn't something new I have built. I use a Seed Studio sensor MR60BHA2 which has an ESP32 which sends the data to Home Assistant through ESPHome. Once it is in Home Assistant you can do automations and notifications. I mostly use it for elderly care. I have another DFRobot C1001 sensor but waiting for ESPHome to add support for it. It has fall detection also which is why I am planning to replace it. They are fairly accurate; ~90%; has been my experience. They work better when mounted on ceiling and have a cone like coverage.


You can buy them on Aliexpress for $5. YouTube and a cursory google search will give you many many options to choose from for examples and tutorials.


There are a lot of radar modules out there in the wild.

But not all of them are good for doing stuff like this.

You need full raw I/Q and DAC access to sweep the frequency.


don't get the $5 ones, they are probably just presence detector or at best distance detector and probably work on 24Ghz. Get the 60Ghz for breathing, heart rate, posture etc.


Tell more about your setup!


Infineon 60GHz IoT FMCW radar modules have all datasheets published. That's super rare for Infineon - usually they are the worst NDA-hell on earth.

Chinese vendors sell uC+Radar-Module units on Aliexpress for around ~20-30€. They Infineon-based boards are super easy to spot by looking at the Antenna-on-Chip layout.

You can cut off their head (microcontroller) and directly attach your favorite uC onto the SPI bus to talk to them. Or use the existing one.. not overly complicated to reverse engineer the schematic.

Example: MicRadar RA60ATR2


posted some details at https://news.ycombinator.com/item?id=45150761. happy to answer any additional queries you may have.



I find the web(HTML/CSS) the most open format for sharing. PDFs are hard to be consumed on smaller devices and much harder to be read by machines. I am working on a feature at Jaunt.com to convert PDFs to HTML. It shows up as reader mode icon. Please try it out and see if it is good enough. I personally think we need to do much better job. https://jaunt.com


PDFs can be notoriously difficult to work with on smaller devices


Oncologists making treatment decisions are generally using real computers, not toy mobile devices.


+1! Most LLMs can already output Mathpix markdown. I prompt it to do so and it gives the code and this use a rendering library to show the scalable and selectable equations. No wonder facebook nougat also uses it. Good stuff!


In my experience, this works well but doesn't scale to all kinds of documents. For scientific papers; it can't render formulas. meta's nougat is the best model to do that. For invoices and records; donut works better. Both these models will fail in some cases so you end up running LLM to fix the issues. Even with that LLM won't be able to do tables and charts justice, as the details were lost during OCR process (bold/italic/other nuances). I feel these might also be "classical" methods. I have found vision models to be much better as they have the original document/image. Having prompts which are clear helps but still you won't get 100% results as they tend to venture off on their paths. I believe that can be fixed using fine tuning but no good vision model provides fine tuning for images. Google Gemini seems to have the feature but I haven't tried it. Few shots prompting helps keep the LLM from hallucinating, prompt injection and helps adhering to the format requested.


Maybe a pipeline like:

1. Segment document: Identify which part of the document is text, what is an image, what is a formula, what is a table, etc...

2. For text, do OCR + LLM. You can use LLMs to calculate the expectation of the predicted text, and if it is super off, try using ViT or something to OCR.

3. For tables, you can get a ViT/CNN to identify the cells to recover positional information, and then OCR + LLM for recovering the contents of cells

4. For formulas (and formulas in tables), just use a ViT/CNN.

5. For images, you can get a captioning ViT/CNN to caption the photo, if that's desired.


I don't see how you make LLM improve tables where most of the time table is single word or single value that doesn't have continuous context like a sentence.


IMHO, the LLM correction is most relevant/useful in the edge cases rather than the modal ones, so I totally agree.


They take images


How to segment the document without LLM?

I prefer to do all of this in 1 step with an LLM with a good prompt and few shots.

With so many passes with images, the costs/time will be high with ViT being slower.


Segmenting can likely be done on a really small resolution and with a CNN, making it real short.

There are some heuristic ways of doing it but i doubt you'll be able to distinguish equations from text.


Segmenting at lower resolution and then using them at higher resolution using resolution multipliers don't work as other items bleed in. FastSAM paper has some interesting ideas on doing this with CNNs which I guess SAM2 have superseded. However, the complication in the pipeline is not worth the result as I find vision LLMs are able to do almost the same task within the same OCR prompt.


Apple APIs such as Live Text, subject identification, Vision. Run them on a server, too


I agree that vision models that actually have access to the image are a more sound approach than using OCR and trying to fix it up. It may be more expensive though, and depending on what you're trying to do it may be good enough.

What I want to do is reading handwritten documents from the 18th century, and I feel like the multistep approach hits a hard ceiling there. Transkribus is multistep, but the line detecion model is just terrible. Things that should be easy, such as printed schemas, utterly confuse it. You simply need to be smart about context to a much higher degree than you need in OCR of typewriter-written text.


I also think it’s probably more effective. Every time hand-crafted tools are better than AI but then the model becomes bigger and AI wins. Think hand crafted image classification to full model or hand crafted language translation to full model.

In this case, the model can already do the OCR and becomes an order of magnitude cheaper per year.



both openai and claude vision models are able do that for me. It is more expensive than tesseract which can run on cpu but I assume it will become similarly cheap in the near future with open models and as AI becomes ubiquitous.


It's not OSS, but I've had good experiences with using MathPix's API for OCR for formulas


nougat, donut are OSS. There are no OSS vision models but we will soon have them. MathPix API are also not OSS and I found them expensive compared to vision models.

Mathpix Markdown however is awesome and I ask LLMs to use that to denote formulas as latex is tricky to render in HTML because of things not matching. I don't know latex well so haven't gone deeper on it.


We've been trying to solve this with https://vlm.run: the idea is to combine the character level accuracy of an OCR pipeline (like Tesseract) with the flexibility of a VLM. OCR pipelines struggle with non-trivial text layouts and don't have any notion of document structure, which means there needs to be another layer on top to actually extract text content to the right place. At the other end of the spectrum, VLMs (like GPT4o) tend to perform poorly on things like dense tables (either hallucinating or giving up entirely) and complex forms, in addition to being much slower/more expensive. Part of the fix is to allow a 'manager' VLM to dispatch to OCR on dense, simple documents, while running charts, graphs etc. through the more expensive VLM pipeline.


Maybe you could try extracting the text also using some pdf text extraction and use that also to compare. Might help fix numbers which tesseract gets wrong sometimes.


Jaunt Labs | Full-Time | Full-stack engineer | REMOTE

Jaunt Labs is in the process of disrupting the professional content sharing landscape, focusing on documents, presentations, and more. Currently in our alpha launch phase, our momentum is building rapidly. Plus, our founders come with a track record, having successfully built and sold a startup in a similar domain. We aspire to become the platform which democratises knowledge and discussions around it.

We're actively seeking full-stack engineers to join our distributed team. With Django at our core, we're building scalable apps that can handle the demands of our network. On the frontend, it's all about ES6, no frameworks required.

Here's the structure: while autonomy is key, we touch base a few times weekly, typically around 9AM PST. You'll need at least an hour or two of overlap during these meetings.

Why consider us? Beyond the exciting work we're doing, you'll be part of a cohesive team, tackling challenges collaboratively and driving real change.

Interested in being part of our journey? Drop us a line about why you would like to join us at troy [at] jauntlabs.com or reach out directly to our founder at jon [at] jauntlabs.com. Let's build something remarkable together.


http://jauntlabs.com/ seems to be down/not loading


Your posts are awesome! They helped me get started on my journey. Thanks!


Thanks, I'm glad you like them!


Bluetooth repeaters - BLE to WiFI so I can use them in Home Assistant

Button Bot - SwitchBot alternative

Wifi Calling Bell - Relay control calling bell with auto shut-off

Cameras - Uses ESPEye and ESP32 Cam, low res, low latency and does NOT hang Standing Desk - Turns on and off linear actuators

Water controllers - Relays attached to solenoids to automate my plants drip-watering and turn on sprinklers

PIR Sensors - A bit noisy, still not satisfied with performance

RF Transmitter - To replay RF signals

RF Receiver - To receive RF signals

The BLE Repeater has been really useful as it has helped me make many BLE devices available in Home Assistant making automations easy. The nRF connect app has been really helpful to make this happen.


Why use Slack? Slack has rate limits and one will not get the notifications when something real goes down as there will be 100s of messages overloading Slack API.


Slack is convenient just because we use it for some other purposes.

Someone else pointed out that Telegram is also a good idea - that's very easy to do. I've set up Telegram alerts for other purposes.


Rate limits are present in chat apps to prevent the service from being abused. Notification tools don't have that.


Thanks for RailsCasts Ryan! A few days back I read a thread on how ruby/rails is losing junior devs because of the complexity/can of worms docs of Rails 7. I believe it is because we are missing high quality content like RailsCasts today.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: