there is no technical reason why you couldn't unbolt openai interface and bolt in llama. Moreover, once you have this, you need only load the model into memory once. emulating different agents would be handled exclusively through the context window sizes that llm's expect. each agent would just have its own evolving context window. roundrobin the submissions. repeat
what's crazy to think about is what new things will become possible as the context window sizes creep up.
Well, cost prohibitive on big models obviously. But even just things like DreamBooth training extra associations to sets of new data images is tuning the model as you go
what's crazy to think about is what new things will become possible as the context window sizes creep up.