More

wsxiaoys · 2025-12-08T14:59:57 1765205997

OP here - this is Part 2 of a series documenting how we built NES (Next Edit Suggestions), our real-time edit model inside the Pochi editor extension.

The real challenge (and what ultimately determines whether NES feels “intent-aware”) was how we manage context in real time while the developer is editing live. For anyone building real-time AI inside editors, IDEs, or interactive tools.

I hope you find this interesting. Happy to answer any questions!

wsxiaoys · 2025-11-20T01:51:09 1763603469

I’ve been experimenting with next-edit prediction for a while and wrote up how we trained the edit model that powers our Tab completion feature. This post is part of a broader series where we share how we built this feature from the low-level modeling right up to the editor extension.

The cool part is we fine-tuned Gemini Flash Lite with LoRA instead of an OSS model, helping us avoid all the infra overhead and giving us faster responses with lower compute cost.

wsxiaoys · 2025-10-12T21:19:41 1760303981

I've spent the last few months working on a custom RL model for coding tasks. The biggest headache has been the lack of good tooling for tuning the autorater's prompt. (That's the judge that gives the training feedback.) The process is like any other quality-focused task—running batch rating jobs and doing SxS evaluations—but the tooling really falls short. I think I'll have to build my own tools once I wrap up the current project

wsxiaoys · 2025-01-13T06:49:37 1736750977

Appreciated! Fixed

wsxiaoys · 2025-01-13T00:49:39 1736729379

> So using 2 NVLinked GPU's with inference is not supported?

To make better use of multiple GPUs, we suggest employing a dedicated backend for serving the model. Please refer to https://tabby.tabbyml.com/docs/references/models-http-api/vl... for an example

SOLAR_FIELDS · 2025-01-13T03:08:50 1736737730

I see. So this is like, I can have tabby be my LLM server with this limitation or I can just turn that feature off and point tabby at my self hosted LLM as any other OpenAI compatible endpoint?

wsxiaoys · 2025-01-13T03:31:30 1736739090

Yes - however, the FIM model requires careful configuration to properly set the prompt template.

wsxiaoys · 2025-01-13T00:40:27 1736728827

Tabby comes with builtin RAG support so you can add this api framework to it.

Example: https://demo.tabbyml.com/search/how-to-configure-sso-in-tabb...

Settings page: https://demo.tabbyml.com/settings/providers/doc

wsxiaoys · 2025-01-12T22:58:29 1736722709

Check https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ to see a local setup with 3090.

mkl · 2025-01-13T00:35:54 1736728554

That thread doesn't seem to mention hardware. It would be really helpful to just put hardware requirements in the GitHub README.

wsxiaoys · 2025-01-12T22:56:27 1736722587

Tabby is engineered for team usage, intended to be deployed on a shared server. However, with robust local computing resources, you can also run Tabby on your individual machine. Check https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ to see a local setup with 3090.

wsxiaoys · 2025-01-12T21:38:19 1736717899

Never imagined our project would make it to the HN front page on Sunday!

Tabby has undergone significant development since its launch two years ago [0]. It is now a comprehensive AI developer platform featuring code completion and a codebase chat, with a team [1] / enterprise focus (SSO, Access Control, User Authentication).

Tabby's adopters [2][3] have discovered that Tabby is the only platform providing a fully self-service onboarding experience as an on-prem offering. It also delivers performance that rivals other options in the market. If you're curious, I encourage you to give it a try!

[0]: https://www.tabbyml.com

[1]: https://demo.tabbyml.com/search/how-to-add-an-embedding-api-...

[2]: https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ

[3]: https://www.linkedin.com/posts/kelvinmu_last-week-i-introduc...

maille · 2025-01-12T22:05:35 1736719535

Do you have a plugin for MSVC?

wsxiaoys · 2025-01-12T23:00:57 1736722857

Not yet, consider subscribe https://github.com/TabbyML/tabby/issues/322 for future updates!

somberi · 2025-01-13T02:10:20 1736734220

https://github.com/codespin-ai/codespin-vscode-extension

tootie · 2025-01-12T22:28:42 1736720922

Is it only compatible with Nvidia and Apple? Will this work with an AMD GPU?

wsxiaoys · 2025-01-13T00:48:14 1736729294

Yes - AMD GPU is supported through vulkan backend:

https://github.com/TabbyML/tabby/releases/tag/v0.23.0

https://tabby.tabbyml.com/blog/2024/05/01/vulkan-support/

wsxiaoys · on June 11, 2024

Fun fact: We've implemented binary embedding search [1] without the need for a specialized vector database. Instead, dimensional tokens like 'embedding_0_0', 'embedding_1_0' are created and being built into the tantivy index [2].

We're satisfied with the quality and performance this approach yields, while still keep Tabby embed everything into a single binary.

[1] My binary vector search is better than your FP32 vectors: https://blog.pgvecto.rs/my-binary-vector-search-is-better-th...

[2] Tantivy: https://github.com/quickwit-oss/tantivy