Yes exactly, the API is the best way to handle text features. The actual semantics often matter a lot . Is the API an option for you or would you need this local?
Less feature engineering is definitely something we are aiming for. The current version is actually only based on statistics, the real world connections between features is something we're working on right now and hope to show results for soon. That's the next step
When we released TabPFNv1 over three years ago, I didn’t expect at all the hundreds of comments and reposts we would see. Tabular data had been a field getting little love from AI research—but we immediately felt that this was a topic that data scientists, scientists, financial analysts, and enterprise users deeply cared about. Glad its useful to people!
TabPFN-2.5 default (one forward pass) matches AutoGluon 1.4 tuned for four-hours. Autogluon is the strongest AutoML including stacking of XGB and cat boost and even
includes the previous TabPFNv2.
Prior Labs | Founding-Team: Software Engineer, Data Scientist, ML Engineer, Product Manager, Developer Relations| On-Site (Berlin, Freiburg) | Full-time
Prior Labs is building foundation models for structured/tabular data – AI's biggest blind spot. While LLMs handle text/images, tables (numbers, categories, text mix) need native understanding. Our approach, TabPFN (published in Nature, 1M+ downloads, 3k+ GitHub stars), uses transformers pre-trained on synthetic data to achieve state-of-the-art results on small datasets zero-shot.
We're tackling a $100B+ opportunity to transform data science, finance, healthcare, and science. Backed by €9M pre-seed from Balderton, XTX Ventures, and leaders from Hugging Face, DeepMind, etc.
We're hiring a founding team to build this new category:
Software Engineer: Build scalable infra & APIs. Data Scientist / ML Engineer: Optimize & scale our TFMs. Product Manager: Define the vision for AI-native tabular tools. Developer Relations: Grow our community & drive adoption. Location: On-site in Berlin or Freiburg (we build together). Offer: Competitive salary + significant equity. Shape the future of AI for structured data from day one.
Prior Labs | Founding-Team: Software Engineer, Data Scientist, ML Engineer, Product Manager, Developer Relations| On-Site (Berlin, Freiburg) | Full-time
Prior Labs is building foundation models for structured/tabular data – AI's biggest blind spot. While LLMs handle text/images, tables (numbers, categories, text mix) need native understanding. Our approach, TabPFN (published in Nature, 1M+ downloads, 3k+ GitHub stars), uses transformers pre-trained on synthetic data to achieve state-of-the-art results on small datasets zero-shot.
We're tackling a $100B+ opportunity to transform data science, finance, healthcare, and science. Backed by €9M pre-seed from Balderton, XTX Ventures, and leaders from Hugging Face, DeepMind, etc.
We're hiring a founding team to build this new category:
Software Engineer: Build scalable infra & APIs.
Data Scientist / ML Engineer: Optimize & scale our TFMs.
Product Manager: Define the vision for AI-native tabular tools.
Developer Relations: Grow our community & drive adoption.
Location: On-site in Berlin or Freiburg (we build together).
Offer: Competitive salary + significant equity. Shape the future of AI for structured data from day one.
AI has transformed text, images, and code—but structured data remains overlooked. Prior Labs is an early-stage startup building Foundation Models for tabular data, unlocking a new AI modality with the potential to transform science, finance, healthcare, and data science itself. Our model, published in Nature, is already state-of-the-art for small datasets, and we’re scaling this into a step-change for data science.
We’re backed by Balderton, XTX Ventures, and leaders from Hugging Face, Black Forest Labs, DeepMind, DataRobot, .. Our team includes world-class researchers and engineers from top AI labs, and we’re growing fast.
We're hiring founding engineers & builders to help define this new category:
Software Engineer – Build scalable infrastructure and APIs to integrate our models into real-world applications.
Product Manager – Define and execute the vision for AI-native tools for structured data.
Developer Relations – Grow the developer community, drive adoption, and showcase use cases.
ML Engineer – Optimize and scale our foundation models for structured data.
Location: On-site in Berlin or Freiburg (we believe in building together).
Why Join? Competitive salary + equity, shape the future of foundation models for structured data from day one.
Thanks a lot! Currently have an issue on documenting how to use for more samples at https://github.com/PriorLabs/TabPFN/issues/129. Will do this soon, maybe give an upvote there if it matters to you.
Yes! This makes sense from a learning perspective: More samples add additional evidence the datapoint is actually what you observed - based on one sample the model is closer to a mean regression (which would translate to more balanced class probabilities in classification).
Transformers have trouble counting repeated entries (there was a famous failure case of ChatGPT, asking it to count the number of 1s and 0s in a string). This model has some tricks to solve this.
Thanks a lot! We don't see clear artifacts for the synth data. Part of the "trick" is to keep the capacity of our model low, it has only about 11M parameters. That forces the model to "learn an in-context learning algorithm" or in other words "do in-context learning rather than in-weigthts learning".
Adding real data on top will help, agreed! The synthetic data is very broad, we started by a synth data prior that was just BNNs samples with differing sizes and thus super broad. Our new data samples functions more densely that are simpler to explain but could still sample almost any function (with the constraints that our networks aren't infinitely complex).