Hacker Newsnew | past | comments | ask | show | jobs | submit | ImageXav's commentslogin

This seems like such an easy way to create perverse incentives and profit off people who are already down on their luck. Imagine being told that the only way to get considered is to pay a fee. Then later on you get told to pay the gold fee for priority. Oh you're still not getting hired? Go for our platinum package that will definitely make the difference! Not enough money? No worries, we'll take 30% of your salary for the first few years. Or maybe we'll just give you some a fixed debt at a high interest rate. Aren't you glad you used us?

1. What you describe already happens (has always happened?) in some blue collar jobs, its just the agencies dont call them gold or platinum plan, they call them "mandatory traning sessions" or "medical checks" that has built in admin fees.

2. Even linkedin upsells their plans "to increase you visibility"


Agreed. One at a time testing (OAT) has been outdated for almost a century at this point. Factorial and fractional factorial experiments have been around for that long and give detailed insights into the effect of not just single changes but the interaction between changes, which means you can superpower your learnings as many variables in DL do in fact interact.

Or, more modern Bayesian methods if you're more interested in getting the best results for a given hyperparameter sweep.

However, that is not to detract from the excellent effort made here and the great science being investigated. Write ups like this offer so much gold to the community.


I would add an aspect that is not covered here but is often ignored: the strong labour protection laws result in a mentality where if you get a good job you are much less likely to want to take risks e.g. start your own business. There was a post on the HENRY (high earner, not rich yet) UK subreddit the other day from someone who had a wealth of experience and had the opportunity to join a start up as a CTO. It honestly sounded like a great chance to initiate change. All of the comments were telling the poster that they had it good, that 99% of start ups fail, that the hours would be gruelling. I feel as though the conversation would have been quite different in a US subreddit.

A term they like to use is 'crabs in a bucket'.


And that was sound advice.

I had the opportunity to move to the US and likely make 2-3x what I make in Europe.

My question is - what for? I earn enough money here that I could buy a nice house and raise my family in relative comfort. Why take unnecessary risk when I already have what I want?


I guess that's the crux of it. From an individual perspective it makes sense to stay in a stable environment, especially if a family is involved. However, I think from a societal perspective it is desirable to have people who gamble on creating new products which can raise the bar in their given industries.

Also, just because the start up fails doesn't mean it was a waste of time. If you manage to provide employment for even just 3 or 4 people for a few years, help them and yourself develop, that is a valuable success.


But that will always come down to individual preferences. Typically people that have the desire to take risks don't need that sort of advice, they will be looking for that sort of thing already.


This is an interesting point. I've been trying to think about something similar recently but don't have much of an idea how to proceed. I'm gathering periodic time series data and am wondering how to factor in the frequency of my sampling for the statistical tests. I'm not sure how to assess the difference between 50Hz and 100Hz on the outcome, given that my periods are significantly longer. Would you have an idea of how to proceed? The person I'm working with currently just bins everything in hour long buckets and uses the mean for comparison between time series but this seems flawed to me.


I don't know if you'll be reading this, but my first intuition would be to determine my effective sampling rate, and determine if samples are comparable at all in the first place.

For example, if your phenomenon is observable at 50 Hz, maybe even 10 Hz, then any higher temporal resolution does not give you new information, because any two adjacent datapoints in the time-series are extremely correlated. Going the other way, at a very low sampling frequency you'd just get the mean, which might not reveal anything of interest.

If you bin 100 Hz data at 50 Hz, are they the same? Is the Fourier spectrum the same? If you have samples of different resolution you must choose the lowest common denominator for a fair statistical comparison. Otherwise, a recording between a potato and an advanced instrument would always be "statistically different", which doesn't make sense.

If you don't find "anything", the old adage goes "the absence of evidence is not the evidence of absence", so statistics don't really fail here. You can only conclude that your method is not sensitive enough.


I've had the complete opposite experience, and feel the complete opposite way. What is there to learn from failing a leetcode? It feels like luck of the draw - I didn't study that specific problem type and so failed. Also, there is an up front cost of several months to cover and study a wide array of leetcode problems.

With a take home I can demonstrate how I would perform at work. I can sit on it, think things over in my head, come up with an attack plan and execute it. I can demonstrate how I think about problems and my own value more clearly. Using a take home as a test is indicative to me that a company cares a bit more about its hiring pipeline and is being careful not to put candidates under arbitrary pressures.


Would you rather do 10 take-homes or 10 leetcode questions?

Either way, when you fail, chances are that you will not get any meaningful feedback other than "we have decided to move forward with other candidates".

If you had done a take-home, how could you know where you went wrong?

If you had done a leetcode question, you can look up the question after the interview and usually learn from your mistakes.

With leetcode you usually don't need the interviewer's feedback to improve. You don't even need the interview. And after a certain point you won't need that much time to prepare.


Yes, especially as models are known to have a preference towards outputs of models in the same family. I suspect this leaderboard would change dramatically with different models as the judge.


I don't care about either method. The ground truth should be what a human would do, not what a model does.


There may be different/better solutions for almost all those kind of tasks. I wouldn’t be surprised if optimal answer to some of them would be refusal/defer ask, refactor first, then solve it properly.


That response is quite in line with the typical human based PR response on a first draft.

There is a possibility that machine based PR reviews are better: for instance because they are not prejudiced based on who is the initiator of the PR and because they don't take other environmental factors into account. You'd expect a machine to be more neutral, so on that front the machine should and possibly could score better. But until the models consistently outperform the humans in impartially scored quality vs a baseline of human results it is the humans that should call this, not the machines.


I wouldn't necessarily expect a machine to be more neutral. Machines can easily be biased too.


On something like a PR review I would. But on anything that would involve private information such as the background, gender, photographs and/or video as well as other writings by the subject I think you'd be right.

It's just that it is fairly trivial to present a PR to a machine in such a way that it can only comment on the differences in the code. I would find it surprising if that somehow led to a bias about the author. Can you give an example of how you think that would creep into such an interaction?


They are different models already but yes, I already let ChatGPT judge Claude's work for the same reason.


Thanks for sharing that. Interesting that the leaderboard is dominated by Anthropic, Google and DeepSeek. Openai doesn't even register.


OpenAI has a lot of share that simply doesn’t exist via OpenRouter. Typical enterprise chat bot apps use it directly without paying a tax and may use litellm with another vendor for fallback.


How did you achieve that? I was looking into it and $0.006/min is quoted everywhere.


Harvesting idle compute. https://borgcloud.org/speech-to-text


Do you support speaker recognition?


No. I found models doing that unreliable when there are many speakers.


This is your service?


Yes


I feel as though it also represents the fact that contributors are less invested in the project. There was a small study done a few years back hypothesizing that the number of swear words related somewhat to code quality [0] due to emotional involvement of the codebase authors. I can imagine this to be somewhat true. I would love to see this study redone now that LLMs are widespread on pre chatgpt repos (as I suspect that repos created using LLMs are going to be very sanitised).

[0] https://cme.h-its.org/exelixis/pubs/JanThesis.pdf


I show my investment in projects through means other than swearing, for example through extensive testing.


Even better, python has named tuples [0]. So if you have a tuple that you are sure will always have the same inputs you can declare it:

``` Point = namedtuple('Point', 'x y') pt1 = Point(1.0, 5.0) ```

And then call the X or Y coordinates either by index: pt1[0], pt1[1], or coordinate name: pt1.x, pt1.y.

This can be a really handy way to help people understand your code as what you are calling becomes a lot more explicit.

[0] https://stackoverflow.com/questions/2970608/what-are-named-t...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: