Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did not respond is indeed a legitimate result, however (as the blog points out) if the non-responders differ from the responders then every evaluation you do on the responders will be biased.

For example, if you ask students about their satisfaction with teaching, I'd guess that students with a bad experience are more likely to reply to your survey. Based on the data you gathered you will think that the teaching at the uni is worse than it really is.



Yup! And the “right” way to handle that bias depends on a lot of background information/subject matter experience.

If the question were “Which musical acts do you want for the Spring Fling festival?”, it might be okay—-or even smart—-to ignore the non-responders. Including data from people unlikely to attend is probably unhelpful. If you’re asking about workloads or engagement, you certainly can’t assume that data is missing at random or the non-responses are irrelevant.

For teaching specifically, one of the smartest questions I’ve seen is “How well do you think you’re doing in this course?” The crosstabs can help address response bias.


>I'd guess that students with a bad experience are more likely to reply to your survey.

Or only those with strong feelings one way or the other answer.

Or those with strongly negative feelings fear that the survey isn't really anonymous and they worry about retribution.

A. is pretty much how this would be done in most real-world situations. Make a second attempt to get people to answer and then go with what you have assuming you did get some reasonable response rate which 75% probably is.


In fact, when you conduct surveys, it's very common to ask screener questions. Among likely voters, among IT decision makers, among developers, etc.

It's fairly clear in this case: undergraduate(?) students.

In general, surveys are trying to get statistics from a demographic that's interesting to the person doing the survey, such as buyers or influencers of purchase decisions for a given product.


Wanting to use data is not a valid reason to allow for using data which is not suitable to use. If you send out 120 surveys and get 90 back, you can't make assumptions about what those 30 would have said and you just have to present the data you have.


Eh, it’s tricker than just “go with what you’ve got.”

For example, you should be checking whether the response rate is associated with other factors and incorporate that into your analysis. You might find that you have pretty good data from unhappy students, but not satisfied ones, or vice versa.


I mean it is extremely common to mislead (often unintentionally and with the best motives) and use the performance of collecting data to give credence to that. The alternative is to be up front about your methodology, which means not making assumptions at multiple stages in the process, and not shading the conclusions by 'looking for other factors' or other things. When you do multiple rounds of 'fixing' data you are just injecting assumptions about the true distribution, which violates the entire point of collecting data at all. If you 'know' what the answer should look like, just write that down that assumption and skip the extra steps, OR ensure the methodology will allow the data to prove you wrong, or allow the data to show a lack of a conclusion (including by lack of data).

I realize I'm taking a very harsh stance here, but I've seen again and again people 'fixing' data in multiple rounds, the effect of which is any actual insight is removed in favor of reinforcing the assumptions held before collecting data. When you do this at multiple steps in the process it becomes very hard to have a good intuition about whether you've done things that invalidate the conclusion (or the ability to draw any conclusion at all).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: