More

vishakh82 · 2025-12-16T15:24:19 1765898659

Hi, I’m a member of the Monadic DNA team.

This post describes a local-first approach to personal genomics. Large-scale analysis (processing raw genotype data against the GWAS Catalog, >1M traits) runs entirely on the user’s device.

LLMs are used only for exploration and summarization of selected results and they never see raw genetic data. Users can choose different LLM backends depending on their preferences, including a TEE-based option ( nilAI), local models via Ollama, or hosted models via HuggingFace.

Happy to answer technical questions about the pipeline, privacy tradeoffs, or limitations.

vishakh82 · 2025-10-17T14:43:09 1760712189

A privacy-first genomics tool that lets you explore hundreds of thousands of genetic traits from GWAS Catalog data and analyze your genetic information entirely in your browser, with optional secure LLM analysis.

vishakh82 · 2025-10-13T02:43:14 1760323394

I'm building Monadic DNA explorer, a tool to explore thousands of genetic traits from GWAS Catalog in your browser and plug in your own DNA data from 23andMe, Ancestry, etc. All processing happens locally on your machine and AI insights are run in a private LLM inside a TEE.

https://explorer.monadicdna.com/

I'll be adding more features in the coming days!

vishakh82 · 2025-09-25T04:09:27 1758773367

We're building this at Monadic DNA.

I recently posted about our first stab at this at https://vishakh.blog/2025/07/08/using-mpc-for-anonymous-and-....

We'll have a waitlist up pretty soon for people to sign up for a batch of private sequencing.

vishakh82 · 2025-09-22T19:34:12 1758569652

It's really not that bad. We're close to using FHE in a production consumer app.

https://vishakh.blog/2025/08/06/lessons-from-using-fhe-to-bu...

adgjlsfhk1 · 2025-09-23T05:00:27 1758603627

if you're talking about doing database queries on a 5mb database, why not just ship the database client side and have them do the computation?

vishakh82 · 2025-09-23T14:44:49 1758638689

You may wish to build a protocol where third parties can asynchronously operate on user data. You may also want to have separation between the end app and the compute layer for legal or practical purposes. Finally, you may not want to store large payloads on client devices.

adgjlsfhk1 · 2025-09-23T19:26:28 1758655588

5mb is hardly a "large payload"

vishakh82 · 2025-09-23T20:19:17 1758658757

I'm giving you general reasons why this is the case. For our own app, we hope to build a protocol where third parties can operate async on user data (with consent).

vishakh82 · 2025-07-09T18:29:30 1752085770

Your links seem to relate to regular plaintext sequencing, storage and analysis.

We are building under full encryption.

vishakh82 · 2025-07-09T18:00:10 1752084010

Unfortunately, credible external audits are really expensive.

Our project is bootstrapped so we won't be able to afford a 100k audit for a while.

vishakh82 · 2025-07-09T14:41:53 1752072113

Not at all. "S3" is only in the loop because that's what labs generally use. In production, when we have ongoing scale, we will not use S3 or anything like it to transfer data between labs and our infra, even if it means using sneakernet!

The whole point of our project is to keep people's data always under encryption so that nobody can sell the data even if they wanted to. Using MPC (and FHE) we ensure that nobody can decrypt your data without your permission.

You can also delete your data any time without needing any third party's permission using the latest versions of the libraries we use.

We are building all this go get away from the closed, exploitative model that 23andMe built. The way we are building our infra, our company could go out of business tomorrow and you'll still be able to use the protocol and have access to your data and insights.

Also, fun fact, genetic data from newborns is retained by the state in many industrialized countries. We need to get that data away from "trust me, bro" infrastructure to securing it using MPC and FHE.

mbeavitt · 2025-07-09T19:47:19 1752090439

Ok but the lab has access to the unencrypted data? You haven't removed the requirement that the user needs to trust the lab with their raw genome markers. This entire operation hinges on the lab's trustworthiness, does it not?

Real_S · 2025-07-09T20:01:28 1752091288

Excellent point.

We are developing a solution that will allow cryptography for DNA molecules, allowing DNA to be secure in any lab. It fits well with Monadic's front end.

https://www.geneinfosec.com/

vishakh82 · 2025-07-09T20:39:57 1752093597

We address this point in the article. On the default path with existing rules and regulations some degree of trust in labs will always be required.

At-home sequencing could be a game changer.

The other reply also mentions molecular cryptography which could provide really strong anonymity and privacy guarantees. We hope to do a PoC accordingly some time in the neat future.

AtariATMHacker · 2025-07-09T17:49:51 1752083391

> Also, fun fact, genetic data from newborns is retained by the state in many industrialized countries.

Do you have any sources for this?

vishakh82 · 2025-07-10T16:26:14 1752164774

https://www.cdc.gov/newborn-screening/about/index.html

https://www.genomicsengland.co.uk/initiatives/newborns

vishakh82 · 2025-07-09T14:29:02 1752071342

You're right about "anonymized" and "anonymous". We do point out avenues to reach anonymity.

The legalese is for informed consent since biological materials are involved, handling liability and pointing out that the exercise itself was experimental and an early step towards productionzation.

The physical token could be a UX nightmare and it could get expensive at scale. Using a more developed app which accept revocable public keys from the user might be more workable.

vishakh82 · 2025-07-09T14:12:50 1752070370

Think monad as in philosophy, less monad as in a programming burrito.

Our intention is to let each user be a self-contained, enclosed (through encryption) unit where they get insights tailored to their unique genome. At the same time, we want to aggregate data (securely and with consent) from all users to power medical and research findings.

It sort of also works in the programming monad sense as the data is always enclosed and encrypted and never "directly" operated on.