More

noahho · 2025-11-06T20:34:25 1762461265

Yes exactly, the API is the best way to handle text features. The actual semantics often matter a lot . Is the API an option for you or would you need this local?

noahho · 2025-11-06T20:09:56 1762459796

Less feature engineering is definitely something we are aiming for. The current version is actually only based on statistics, the real world connections between features is something we're working on right now and hope to show results for soon. That's the next step

noahho · 2025-11-06T20:07:14 1762459634

When we released TabPFNv1 over three years ago, I didn’t expect at all the hundreds of comments and reposts we would see. Tabular data had been a field getting little love from AI research—but we immediately felt that this was a topic that data scientists, scientists, financial analysts, and enterprise users deeply cared about. Glad its useful to people!

noahho · 2025-11-06T19:39:15 1762457955

TabPFN-2.5 default (one forward pass) matches AutoGluon 1.4 tuned for four-hours. Autogluon is the strongest AutoML including stacking of XGB and cat boost and even includes the previous TabPFNv2.

noahho · 2025-05-05T16:41:47 1746463307

Prior Labs | Founding-Team: Software Engineer, Data Scientist, ML Engineer, Product Manager, Developer Relations| On-Site (Berlin, Freiburg) | Full-time

Prior Labs is building foundation models for structured/tabular data – AI's biggest blind spot. While LLMs handle text/images, tables (numbers, categories, text mix) need native understanding. Our approach, TabPFN (published in Nature, 1M+ downloads, 3k+ GitHub stars), uses transformers pre-trained on synthetic data to achieve state-of-the-art results on small datasets zero-shot.

We're tackling a $100B+ opportunity to transform data science, finance, healthcare, and science. Backed by €9M pre-seed from Balderton, XTX Ventures, and leaders from Hugging Face, DeepMind, etc.

We're hiring a founding team to build this new category:

Software Engineer: Build scalable infra & APIs. Data Scientist / ML Engineer: Optimize & scale our TFMs. Product Manager: Define the vision for AI-native tabular tools. Developer Relations: Grow our community & drive adoption. Location: On-site in Berlin or Freiburg (we build together). Offer: Competitive salary + significant equity. Shape the future of AI for structured data from day one.

Apply: https://jobs.ashbyhq.com/prior-labs

Questions? Reach out at noah@priorlabs.ai

noahho · 2025-04-11T11:59:55 1744372795

Prior Labs | Founding-Team: Software Engineer, Data Scientist, ML Engineer, Product Manager, Developer Relations| On-Site (Berlin, Freiburg) | Full-time

Prior Labs is building foundation models for structured/tabular data – AI's biggest blind spot. While LLMs handle text/images, tables (numbers, categories, text mix) need native understanding. Our approach, TabPFN (published in Nature, 1M+ downloads, 3k+ GitHub stars), uses transformers pre-trained on synthetic data to achieve state-of-the-art results on small datasets zero-shot.

We're tackling a $100B+ opportunity to transform data science, finance, healthcare, and science. Backed by €9M pre-seed from Balderton, XTX Ventures, and leaders from Hugging Face, DeepMind, etc.

We're hiring a founding team to build this new category:

Software Engineer: Build scalable infra & APIs. Data Scientist / ML Engineer: Optimize & scale our TFMs. Product Manager: Define the vision for AI-native tabular tools. Developer Relations: Grow our community & drive adoption. Location: On-site in Berlin or Freiburg (we build together). Offer: Competitive salary + significant equity. Shape the future of AI for structured data from day one.

Apply: https://jobs.ashbyhq.com/prior-labs

Questions? Reach out at noah@priorlabs.ai

noahho · 2025-03-04T09:52:14 1741081934

Prior Labs | Software Engineer, Product Manager, Developer Relations, ML Engineer | On-Site (Berlin, Freiburg) | Full-time

AI has transformed text, images, and code—but structured data remains overlooked. Prior Labs is an early-stage startup building Foundation Models for tabular data, unlocking a new AI modality with the potential to transform science, finance, healthcare, and data science itself. Our model, published in Nature, is already state-of-the-art for small datasets, and we’re scaling this into a step-change for data science.

We’re backed by Balderton, XTX Ventures, and leaders from Hugging Face, Black Forest Labs, DeepMind, DataRobot, .. Our team includes world-class researchers and engineers from top AI labs, and we’re growing fast.

We're hiring founding engineers & builders to help define this new category:

Software Engineer – Build scalable infrastructure and APIs to integrate our models into real-world applications.

Product Manager – Define and execute the vision for AI-native tools for structured data.

Developer Relations – Grow the developer community, drive adoption, and showcase use cases.

ML Engineer – Optimize and scale our foundation models for structured data.

Location: On-site in Berlin or Freiburg (we believe in building together). Why Join? Competitive salary + equity, shape the future of foundation models for structured data from day one.

Apply now: https://jobs.ashbyhq.com/prior-labs

Questions? Reach out at noah@priorlabs.ai

noahho · on Jan 13, 2025

Thanks a lot! Currently have an issue on documenting how to use for more samples at https://github.com/PriorLabs/TabPFN/issues/129. Will do this soon, maybe give an upvote there if it matters to you.

noahho · on Jan 13, 2025

Yes! This makes sense from a learning perspective: More samples add additional evidence the datapoint is actually what you observed - based on one sample the model is closer to a mean regression (which would translate to more balanced class probabilities in classification). Transformers have trouble counting repeated entries (there was a famous failure case of ChatGPT, asking it to count the number of 1s and 0s in a string). This model has some tricks to solve this.

noahho · on Jan 13, 2025

Thanks a lot! We don't see clear artifacts for the synth data. Part of the "trick" is to keep the capacity of our model low, it has only about 11M parameters. That forces the model to "learn an in-context learning algorithm" or in other words "do in-context learning rather than in-weigthts learning". Adding real data on top will help, agreed! The synthetic data is very broad, we started by a synth data prior that was just BNNs samples with differing sizes and thus super broad. Our new data samples functions more densely that are simpler to explain but could still sample almost any function (with the constraints that our networks aren't infinitely complex).