Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
htmlspecialchars does not throw E_WARNING on multibyte problems (php.net)
30 points by rustc on Dec 15, 2012 | hide | past | favorite | 29 comments


In fairness, display_errors is a terrible idea -- do you actually want errors thrown by your program to be interspersed with the HTML output of your program?

This sort of stuff should be going to the error log. And it is, so "throws error" and "error reporting" in the title are both inaccurate here. It does throw an error; it merely refuses to send that error to the HTML output of the page when display_errors is on. If normal, sane error reporting to the error log is on, the error does show up there.


Most beginners are not going to be looking at their error log, or even know where to look for it. In a development environment, I think display_errors is a very reasonable behavior, and is in line with PHP's goals: be easy to get up and running, even (especially?) for novice programmers. Even people that have been doing this for a decade may forget to tail their logs, so having execution-time warnings makes a lot of sense, and is really the only practical approach for an interpreted language.

I've made vague suggestions (though not written up a full spec/RFC... one of these days) about improving the overall error handling system in PHP since I think there are a limited set of common behaviors that involve a lot of repeated code to get, which basically amount to "use strict". That way people doing serious projects that need a flexible language can get decent and consistent error handling, and newbies hacking together their first dynamic website can continue to work without hitting a brick wall of a learning curve.

When I was first learning to code, something like display_errors would have been great. As it was I just had a few compiler warnings, but (especially as a newbie) who cares as long as it compiles and runs? Getting random warning text printed out on your web page is a real incentive to fix things, but without being so harsh that you just give up. Of course that doesn't scale to larger projects, but it doesn't need to. In PHP's case, that's what set_error_handler() is for.


display_errors can be convenient on non-production sites. I think a lot of beginners are trying to hack stuff on shared hosting, and transfer their php files in via FTP and don't understand how to access logs and do appreciate this feature.

It's just too bad they are hacking on their live site, and/or not ever turning it off after they don't need it.

Don't get me wrong, FTP/live editing a site on production / shared hosting are all crappy ways of doing things for serious development but I can see how it might be useful for beginners.

We've all been there, yeah? even if it was back when php 3 and 4 were common.


> In fairness, display_errors is a terrible idea

You're absolutely right! In both development and production, errors should go to syslog. I have all my machines on rsyslog and they transfer all logs to a central server where I can see them very easily and clearly through an internal web app. They even have services nowadays that make this stupid simple.


It only really makes sense when PHP is used in a shell. If nothing else, its strange that a lot of PHP implementations have them turned on by default.


Not really. People that turn off display_errors and set up proper logging typically know what they're doing; it's a pretty sensible default for "my first website" level of projects. Remember, those folks are just uploading PHP files to their shared host - they don't have a local development environment, let alone a custom-configured production setup.

Is it unfortunate that a lot of production sites run with stupid settings because they aren't run by full-time software engineers or sysadmins? Sure. But the accessibility of making changes is great, and most people would consider it a good thing that people are taking an interest in learning how to do this stuff themselves rather than outsourcing all of their technology problems. After all, isn't that how most of us got started?


I was actually wondering not long ago if there might be some kind of cross-site scripting vulnerability implicit in rendering exceptions as html. Sure, you shouldn't have error reporting on to begin with but who even checks the source code of the exception pages themselves? You're paying close attention to the error itself after all...

Turns out yes maybe... https://nealpoole.com/blog/2011/08/cross-site-scripting-via-...


If you find an odd situation in PHP due to backwards compatibility or whatever, I highly recommend you just make a wrapper function to do what you want.

Another example of a common mistake with default functionality:

Did you know

json_decode('null')

and

json_decode('{')

both return the same thing, NULL. One is correct, one isn't. The only way to actually tell the difference? Also call json_last_error() to make sure there wasn't an error parsing the json. Pretty inconvenient to do every time (and most people wont). Solution: Wrap it.

At Thumbtack we have lots of wrapper functions which take care of situations where a built-in function doesn't have an interface we like. Most of them are purely for convenience though, not situations like this where the default output does something potentially misleading.


> Solution: Wrap it.

Conversely, cut your losses and use something saner with better internal and API coherence.


At Thumbtack we have 2 main repositories, our PHP repo and our Python repo. Our python repo was built from the group up and we use it for lots of things, in fact I think it is larger than our PHP repo. However, our core website is written in PHP, and will always be written in PHP for the reason that is always has been. Had I been a co-founder and been a part of making that decision I might have made a different one, but I wasn't and it doesn't matter. Pragmatic decision making is the name of the game.

I admit that PHP doesn't have the best internal interfaces, however convenience functions with a whatever interface you want go a long way.

I like sets in python, well we have tt_set() in PHP with exactly the same interface.

I like defaultdicts in python, we have tt_default_dict() with the same interface.

I like list.get() in python, we have tt_array_get() in PHP.

I like namedtuples() in python, we have a NamedTuple() class in PHP which provides similar functionality (this one is not perfect though, a little more verbose to instantiate in PHP).

I like string.startswith(), string.endswith(), and string slicing in python, we have tt_str_startswith(), tt_str_endswith(), tt_str_slice().

I like csv.DictReader() in python, we have CsvUtils/DictReader() in PHP.

At the end of the day though, PHP is much more similar to Python than it is different, regardless of any of these convenience functions and wrappers. Potentially investing a significant amount of effort to switch something from one to the other strikes me as a very poor investment.


> PHP is much more similar to Python than it is different

Can you elaborate on this? Compared to what?


Please take your hipster elitism somewhere else. Every language has warts and learning how to avoid them is always a good idea.


That's not elitism; that's reality. Notice that PHP is pretty much the sole language garnering this reaction? No one actively makes fun of C#, Python, Clojure, Haskell, Perl, Java, etc., even when they dislike them.

For example, I don't care for Java at all, but I admit that it's a really good implementation of a language I just don't happen to like very much. Same for C#. My reasons for not using it have nothing to do with the language itself (which is perfectly decent by all accounts). Programmers joke about Perl looking like line noise, but even that's a criticism of how some people use the language and not anything bad about Perl directly.

But no one other than PHP programmers seems to have anything good to say about PHP. It's nearly universally disrespected as an insecure, inconsistent, buggy mess. That's very unusual - perhaps unique - among major programming languages. Maybe there's a reason that non-Java programmers have a grudging respect for Java, non-Python programmers can find things they like about Python, but non-PHP programmers almost never have anything good to say about PHP.

When a lot of subject matter experts vocally dislike something, maybe that thing is misunderstood. When nearly all of them dislike it, perhaps there's a reason for it.


"No one actively makes fun of C#, Python, Clojure, Haskell, Perl, Java, etc., even when they dislike them."

This is laughable. Especially Java.

http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom...


> But no one other than PHP programmers seems to have anything good to say about PHP.

The people least familiar with the language have the most negative to say about it. Making fun of PHP is pretty much a fad -- this particular topic, about htmlspecialchars(), I heard originally 5 years ago and it just cycles around and around. Even the most famously quoted articles about PHP's shortcomings are typically at least 50% flat-out wrong and 25% opinion.

That being said, PHP has a fair share of warts -- it's a language with a long history and rapid change with little overall direction.


> The people least familiar with the language have the most negative to say about it

My 12 years of experience with PHP would beg to differ. It is, by ANY standard, a terrible language. In fact, it's primary defense of being "convenient" is not even true. Most developers would probably get a Django site up and running in the same amount of time it would take them to get a Cake or CodeIgniter site up and running.


My experience is to the contrary: the people who've had to use it at work (as opposed to voluntarily for their own projects) are the ones who despise it most. It looks pretty neat at first, after all. It's only when you've dug deeply into it - and have experience in other languages to compare it to - that you genuinely start to loathe it.


I criticized PHP for years without ever really using it. Then I finally did some real work in the language. Did my opinion change from the exposure? You bet it did. Turns out that PHP is even worse than I had thought.

I know plenty of people experienced with PHP who actively criticize it. People who have never used it generally tend not to care about it.


> Notice that PHP is pretty much the sole language garnering this reaction?

Nope.

There's another one. JavaScript.

PHP is the most popular language on the server, JavaScript is the most popular language in the browser. Coincidence? I think not.

Do you know what the definition of a hipster is? Hating that which is mainstream and popular.

You, sir, are a hipster.


I like JavaScript quite a bit. Ruby is way more trendy than PHP but you don't hear much bad about it.


False equivalence. The fact that every language has warts does not mean that all languages are equally warty.

There rapidly comes a point when you're spending so much time working around PHP's prolific warts that it would be more efficient for you to move to a less warty language.


Oh thank god, I've never heard someone on HN say that PHP sucks and that you should switch. Thanks for bearing that cross for us.


Obligatory "not always an option" response to your obligatory "PHP sucks, use something else" response.


Sometimes that's not always a choice. OpenGL uses the same check for error pattern.


OpenGL is a C API. This is an acceptable pattern for a language at the level of C, without such features as exceptions, multiple return values or sum types.

Which is not the case of PHP, which has both multiple return values (kind-of) and exceptions, and had them long before the json* functions where added.


This "bug" has been resolved since 01-Mar-2012 when 5.4 was released, and there's a very straight-forward explanation of it if people would just RTFM.

This is the worst part of the PHP community - all the in-fighting when the core developers don't bend over backwards to accommodate short-sighted programmers who know their conflicting demand is the one that should be heard.


The problem is that people have conflicting demands. They want all errors - even most unobvious ones in most deep functions, in all outputs - be immediately shown to them with all details, without having to do even minimal amount of work, and then they want the whole system also be secure. It can not work this way. If you want errors on the output, it can not be secure. If you want secure defaults, you can not have server spewing errors to the output.

So we have a choice here - do we prefer, convenience or security? Given that inconvenience is much less dangerous than insecurity, the choice for PHP developers was security. In ideal world, where nobody runs their production servers with display_errors=on and everybody understand the consequences of their actions, this hack would not be needed. We live however in a real world, where thousands of production sites are run with dislpay_errors, so we need to account for this reality. It means somebody has to check their error log from time to time. If it's too hard to do - install custom error handler and override the default behavior.


I start finding those arguments about PHP close to ones Adroid vs iPhone. Yes, PHP is inconsistent and etc. But on the other hand, it just gets the shit done. And for historical reasons, it has a huge community. When you add those two factors together and stop being a hipster... It's not that bad, is it?


It's terrible. PHP doesn't "get the shit done", although PHP programmers can. There are far better languages for getting things done. Huge community is not terribly attractive, as plenty of other, better languages have sufficiently large communities.

I'd survive if I had to program in PHP, but it would definitely be a last resort. It could be worse (Malbolge?) but I don't understand why people feel a need to defend it or say it's not bad.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: