You don't know what it's geared for until you try. Like I said, GPT-4 could consistently encode and decode even fairly long base64 sequences. I remember once asking it for an SVG image, and it responded with HTML that had an <img> tag in it with a data URL embedding the image - and it worked exactly as it should.
You can argue whether that is a meaningful use of model capacity, and sure, I agree that this is exactly the kind of stuff tool use is for. But nevertheless the bar was set.
Sure you do, the architecture is known. An LLM will never be appropriate to use for exact input transforms and will never be able to guarantee accurate results - the input pipeline yields abstract ideas as text embedding vectors, not a stream of bytes - but just like a human it might have the skill to limp through the task with some accuracy.
While your base64 attempts likely went well, that it "could consistently encode and decode even fairly long base64 sequences" is just an anecdoate. I had the same model freak out in an empty chat, transcribing the word "hi" to a full YouTube "remember to like and subscribe" epilogue - precision and determinism are the parameters you give up when making such a thing.
(It is around this time that the models learnt to use tools autonomously in a response, such as running small code snippets which would solve the problem perfectly well, but even now it is much more consistent to tell it to do that, and for very long outputs the likelihood that it'll be able to recite the result correctly drops.)