I think so too. I’m thinking about writing a little server you would run as a si...

I think so too. I’m thinking about writing a little server you would run as a sidecar process. It would tokenize your code and build a local index in a vector DB. Client plugins in IDEs would ask that server for similar code snippets, with a short deadline for response, perhaps 1ms. Then the plugins could assemble a prompt to send to a configurable LLM backend, and do editor-specific code insertion or suggestion or whatever. I use emacs, so I know how I want that to work for me, but it must be different for others.

My open questions though are:

- does the tokenizer for the index need to be the same as the one used by the LLM?

- is running a local LLM feasible, or should I just assume prompts must sent over the network? This would affect latency and privacy a lot.

- is fine tuning helpful? Or necessary, even, to ensure the LLM only provides source code, and only in the target language?