I wonder if you can speculatively assume there are no line ends in quoted blocks and then try to go fast and then fall back to a slower method if you detect that to be the case.
Sort of like branch prediction where a failed prediction is costly but on average you are right.
How do you detect it without ruining the performance of the fast path? Seems like you need to make some kind of assumption. Perhaps in the number of columns in each row. But then what happens if you need to parse data with a different number of columns in some row? (Which is not that uncommon.)
The problem with csv is that many people consuming it don't actually choose that format. They have to parse what they are given.
Yeah, you kind of fell right into the trap I was talking about. Your approach either requires being able to memory map the CSV data or storing the CSV data in memory. And to be honest, it's not clear to me that your approach would end up being faster anyway. There's a lot of overhead buried in there, and it sounds like it's at least two passes over the data in common cases. And in cases where there are line breaks in the data, your approach will perform very poorly I think.
Sort of like branch prediction where a failed prediction is costly but on average you are right.