This is pretty awful code. I saw "buffer[size] = 0" and assumed immediately that was a past-the-end write, but they actually allocate size + 1 bytes. Urgh.
Next they introduced a check for strings shorter than 6 bytes because that's the shortest possible valid string. Why not just check for a valid encoding in the first place? There are too many implicit assumptions about the data going on here and not enough actual validation.
This entire module needs scrapping and rewriting with a proper FSM/parser generator.
And why is an MPEG4 metadata decoder directly handling UTF anyway?
Wow. Integer overflows and underflows all over the place. It's almost like no-one's ever even taken a fuzzer to this before, which would surprise me given that Google is usually relatively security-concious.
This code is probably written by embedded software engineers like myself. No one knows anything about security, and certainly hasn't heard of a fuzzer. If the code works, it ships.
The other parts of Android written in C++ are also a horror show, though I think they have gotten better recently.
And don't even talk about the proprietary HAL blobs or kernel modules. Unspeakable things happen there. And of course, libstagefright talks directly to these.
http://review.cyanogenmod.org/#/c/103276/
http://review.cyanogenmod.org/#/c/103275/
http://review.cyanogenmod.org/#/c/103274/
http://review.cyanogenmod.org/#/c/103273/
http://review.cyanogenmod.org/#/c/103272/