Hacker Newsnew | past | comments | ask | show | jobs | submit | bgentry's commentslogin

As somebody whose first day working at Heroku was the day this acquisition closed, I think it’s mostly a misconception to blame Salesforce for Heroku’s stagnation and eventual irrelevance. Salesforce gave Heroku a ton of funding to build out a vision that was way ahead of its time. Docker didn’t even come out until 2013, AWS didn’t even have multiple regions when it was built. They mostly served as an investor and left us alone to do our thing, or so it seemed those first couple years.

The launch of the multi language Cedar runtime in 2011 led to incredible growth and by 2012 we were drowning in tech debt and scaling challenges. Despite more than tripling our headcount in that first year (~20 to 74) we could not keep up.

Mid 2012 was especially bad as we were severely impacted by two us-east-1 outages just 2 weeks apart. To the extent it wasn’t already, reliability and paying down tech debt became the main focus and I think we went about 18 months between major user-facing platform launches (Europe region and eventually larger sized dynos being the biggest things we eventually shipped after that drought). The organization lost its ability to ship significant changes or maybe never really had that ability at scale.

That time coincided with the founders taking a step back, leaving a loss of leadership and vision that was filled by people more concerned with process than results. I left in 2014 and at that time it already seemed clear to me that the product was basically stalled.

I’m not sure how much of this could have been done better even in hindsight. In theory Salesforce could have taken a more hands on approach early on but I don’t think that could have ended better. They were so far from profitability in late 2010 that they could not stay independent without raising more funding. The venture market in ~2010 was much smaller than a few years later—tiny rounds and low valuations. Had the company spent its pre-acquisition engineering cycles building for scalability & reliability at the expense of product velocity they probably would have never gotten successful.

Even still, it was the most amazing professional experience of my career, full of brilliant and passionate people, and it’s sad to see it end this way.


It remains the greatest engineering team I've ever seen or had the pleasure to be a part of. I was only there from early 2011 to mid 2012 but what I took with me changed me as an engineer. The shear brilliance of the ideas and approaches...I was blessed to witness it. I don't think I can overstate it, though many will think this is all hyperbole. I didn't always agree with the decisions made and I was definitely there when the product stagnation started, but we worked hard to reduce tech debt, build better infrastructure, and improve... but man, the battles we fought. Many fond memories, including the single greatest engineering mistake I've ever seen made, one that I still think about until this day (but will never post in a public forum :)).

It was a pleasure working with you bgentry!


I'm just going to chime in here and say thank you, there still really isn't in my mind a comparable offering to heroku's git push and go straight to a reasonable production

I honestly find it a bit nuts, there's offerings that come close, but using them I still get the impression that they've just not put in the time really refining that user interface, so I just wanted to say thank you for the work you and the GP did, it was incredibly helpful and I'm happy to say helped me launch and test a few product offerings as well as some fun ideas


This!

It absolutely boggles my mind that nothing else exists to fill this spot. Fly and others offer varying degrees of easier-than-AWS hosting, but nobody offers true PaaS like Heroku, IMHO.


The Heroku style of PaaS just isn't very interesting to most large businesses that actually pay for things. The world basically moved on to Kubernetes-based products (see Google and Red Hat)--or just shutdown like a lot of Cloud Foundry-based products. Yes, many individuals and smaller shops care more about simplicity but they're generally not willing/able to pay a lot (if anything).

It seems like you’re right, but it’s strange that the data world seems to be moving in the opposite direction, with PaaS products like Snowflake, DataBricks, Microsoft Fabric, even Salesforce’s own Data Cloud eating the world.

PaaS has always been this thing that isn't pure infrastructure or pure hosted software that you use as-is. Salesforce has something over 100K attendees of partners and users to its annual conference. It's always been this in-between thing with a fairly loose definition. I'd argue that Salesforce was long a cross between SaaS (for the users) and PaaS (for developers). You can probably apply the same view to a lot of other company products.

I find render.com basically as good as Heroku and certainly much better than fly.io's unpredictable pricing

In 2022 Render increased their prices (which for my team worked out at a doubling of costs) with a one month notice period and the CEO's response to me when I asked him if he thought that was a fair notice period was that it was operationally necessary and he was sorry to see us go.

In what way is Fly's pricing unpredictable ?

It varies based on usage

Isn't that how things should be ?

From a consumer's standpoint, why would I choose that?

It's a natural law to pay based on what you consume. Whenever that isn't required, you're typically being subsidized in a form or another, but don't be surprised when it goes away.

Heroku and Ruby, for me, was the 21st century answer to 'deploying' a PHP site over FTP.

The fact that it required nothing but 'git push heroku master' at the time was incredible, especially for how easy it was to spin up pre-prod environments with it, and how wiring up a database was also trivial.

Every time I come across an infrastructure that is bloated out with k8s, helm charts, and a complex web of cloud resources, all for a service not even running at scale, I look back to the simplicity we used to have.


I completely agree that there's nothing comparable to old-school Heroku, which is crazy. That said, Cloudflare seems promising for some types of projects and I use them for a few things. Anyone using them as a one-stop-shop?

For me Northflank have filled this spot. Though by the time I switched I was already using Docker so can't speak directly to their Heroku Buildpack support.

vercel goes a step further, and (when configured this way) allocates a new hostname (eg feature-branch-add-thingg.app.vercel.example.com) for new branches, to make testing even easier.

But their offering is "frontend oriented", what you describe doesn't work for django / laravel / rails / etc, no ?

Have a look at Scalingo, it's a good mix of simplicity and maturity.

https://scalingo.com/blog/heroku-alternative-europe-scalingo...


This looks nice! Wish they had a no-credit-card-required version for educational purposes. For the course I teach we use Spring Boot, and life was good with Heroku till they discontinued the no-credit-card version, and then the only choice we had (with support for Spring Boot) was to move over to Azure, which works but is a bit overkill and complicated for our purposes. I guess we could just use Docker and then many more platform would become available, but I'd rather not add one more step to the pipeline if possible.

Yes. We are taking a stab at the entire infrastructure like Heroku did but with a focus on a coding agent-centric workflow: https://specific.dev

I’ve been using Render for close to 5 years, and it’s excellent. I can’t think of anything I use that it doesn’t do as well or better than Heroku did last I checked.

I agree with everything you said, and can only thank the founders for their tremendous insight and willingness to push the limits. The shear number of engineering practices we take for granted today because of something like Heroku boggles my mind.

I am forever grateful for the opportunity to work there and make it an effort to pass on what I learned to others.


> by 2012 we were drowning in tech debt and scaling challenges.

> the greatest engineering team I've ever seen

How do these two things reconcile in your opinion? In my view , doing something quickly is the easy part , good engineering is only needed exactly when you want things to be maintainable and scalable, so the assertions above don’t really make much sense to me.


It is hard to explain the impact of such massive growth over a 2-3 period. New features were coming online while old ones were being abused by overuse. For instance, we launched PostgreSQL in the cloud, something we take for granted today. Not only that, but we offered an insane feature set around "follow" and "forking" that made working with databases seem futuristic.

I remember when we launched that product we went to that year's PGCon and there were people in the crowd angry and dismissive that we would treat data that way. It was actually pretty confrontational. Products like that were being produced while we were also working on migrating away from the initial implementation of the "free tier" (internally called Shen). It took me and a few others months to replace it and ensure we didn't lose data while also making it maintainable. The resulting tool lovingly named "yobuko" ended up remaining for years after that (largely due to the stagnation and turn over).

Anyways, that was just a slice of it. Decisions made today are not always the decisions you wanted to be made tomorrow. Day0 is great, day100 comes with more knowledge and regret. :D


In general, my impression has been that you don't want to architect your solution at first for massive scaling, because:

* You probably aren't going to need it, so putting the effort into scaling means slowing down your delivery of the very features that would make customers want your solution.

* It typically slows down performance of individual features.

* It definitely significant increases the complexity of your solution (and probably the user-facing tooling as well).

* It is difficult to achieve until you have the live traffic to test your approach.


Yeah I think there is a lot of truth here. You can't solve all the problems and in Heroku's case we focused on user experience (internally and externally). Great ideas like "git push heroku main" are game changers, but what happens once that git server is receiving 1000 pushes a minute? Totally different thought process.

Perhaps the thing I would add is that even with the tech debt and scaling problems we still had over a million applications deployed ready for that request to hit them.


An organization without “tech debt” is not good at prioritizing.

Yea it seems like not much thought was put into scalability.

Tell us more about some of these ideas and approaches that changed you as an engineer! We'd love to hear!

Well many of them you may know in that they made their way into so many systems (though arguably without the refined UX of Heroku) but the two that come up the most and I am teaching others:

* The simpler the interface for the user, the more decisions you can make behind the scenes. A good example here is "git push heroku". Not only is that something every user (bot or human) can run, it is also easy to script, protect, and scale. It keeps the surface area small and the abstraction makes the most sense. The code that was behind that push lasted for quite some time, and it was effectively 1-2 Python classes as a service. But once we got the code into our systems, we could do anything with it... and we did. One of the things that blows my mind is that our "slug" (this is what we called the tarballs of code that we put on the runtimes) maker was itself a heroku app. It lived along side everyone else. We were able to reduce our platform down to a few simple pieces (what we called the kernel) and everything else was deployed on top. We benefited from the very things our customers were using.

* NOTE: This one is going to be hard to explain because it is so simple, but when you start thinking about system design in this way the possibilities start to open up right in front of you. The idea is that everything we do is effectively explained as "input -> filter -> output". Even down to the CPU. But especially when it comes to a platform. With this design mentality we had a logging pipeline that I am still jealous of. We had metrics flowing into dashboards that were everywhere and informed us of our work. We had things like "integration testing" that ran continuously against the platform, all from the users perspective, that allowed us to test features long before they reached the public. All of these things were "input" that we "filtered" in some way to produce "output". When you start using that "output" as "input" and chaining these things together you get to a place where you can design a "kernel" (effectively an API service and a runtime) and start interacting with it to produce a platform.

I remember when we were pairing down services to get to our "kernel" one of the Operations engineers developed our Chef so that an internal engineer needed maybe 5-7 lines of Ruby to deploy their app and get everything they needed. Simple input, that produced a reliable application setup, that could now get us into production faster.

Anyways, those are just a couple.


Here[0] is a talk that shows off some of these tools. Mark led the way on many of these ideas internally.

[0]: https://www.youtube.com/watch?v=yGcaofDq8SM


Thank you so much for this!

Absolutely agree and likewise buddy :)

Thanks for sharing your story. Those early days of using Heroku were really enjoyable for me. It felt so promising and simple. I remember explaining the concept to a lot of different people who didn't believe that the deployment model could be that simple and accessible until I showed them.

Then life went on, I bounced around in my career, and forgot about Heroku. Years later I actually suggested it for someone to use for a simple project once and I could practically feel the other developers in the room losing respect for me for even mentioning it. I hadn't looked at it for so long that I didn't realize it had fallen out of favor.

> That time coincided with the founders taking a step back, leaving a loss of leadership and vision that was filled by people more concerned with process than results

This feels all too familiar. All of my enjoyable startup experiences have ended this way: The fast growing, successful startup starts attracting people who look like they should be valuable assets to the company, but they're more interested in things like policies, processes, meetings, and making the status reports look nice than actually shipping and building.


The cancer of corporations: bureaucracy.

As corporations grow, they also increasingly need some level of process which can increasingly look a lot like bureaucracy.

Bureaucracy is a natural consequence of workforce growth. Respectfully, I think you are blaming the symptoms here, not the root cause.

Having been on a bigco team that underwent the same sort of headcount growth in a very short time I have to imagine that "more than tripling our headcount in that first year" was likely more a driver of the inability to keep up than a solution. That's not a knock on the talents of anyone hired; it's just exceedingly difficult to simultaneously grow a team that fast and maintain any kind of velocity regardless of the complexity of the problems you're trying to tackle. The culture and knowledge that enabled the previous team's velocity just gets completely diluted.

Thanks for a capsule tour through Heroku!

FWIW, the team that eventually created "Docker" was working at the same time on dotCloud as a direct Heroku competitor. I remember meeting them at a meet-up in the old Twitter building but couldn't tell you exactly which year that was. Maybe 2010 or 2011?


Yep, that team did great work. I remember having lunch at the Heroku office with the dotCloud team in 2011 or 2012 and also Solomon Hykes demoing Docker to us in our office’s basement before it launched. So much cool stuff was happening back then!

I remember Solomon's demoing at All Hands... it was such a crazy time for tech and innovation.

Oh hey Curt!! Remember, you are not a monitoring system :)

Haha, funny you should mention that as I was just telling a coworker that story as we worked on a new dashboard for our infrastructure. :D

> people more concerned with process than results

Sounds like something Steve Jobs observed at apple https://youtu.be/l4dCJJFuMsE?si=QOCBUqcUPWu8AsAX


Link with spyware removed: https://youtu.be/l4dCJJFuMsE

I feel as if youtube itself might still contain some tracking.

People recommend invidious and piped but they are starting to have issues.

This video is archived on wayback machine (archive.org) and I feel as if that might be the best way to watch this video right now.

https://web.archive.org/web/20260207024055/https://www.youtu...

Edit: I have also uploaded it on preservetube when I got interested in what are some wayback machine alternatives (given that wayback machine suggests to not upload videos given storage constraints unless absolutely necessary)

So I found preservetube which is sort of intended for this and uploaded this video on it

https://preservetube.com/watch?v=l4dCJJFuMsE


Spyware?

Not OP, but the "si" parameter in the URL is an individual tracking identifier, generated specifically for YouTube to see who you share the link with.


GP means the si url parameter, which is a token that helps google track how their videos are being shared.

I worked with some of the folks from there, and honestly you make it sounds like tech debt is inevitable consequence haunting projects from year one.

I disagree, I think the folks just did a sloppy job of "let's bungee strap it all together" for speed, instead of more serious planning and architecturing. They self-inflicted the tech debt on them and got drowned in the debt interest super fast.


There's someone out there who built the scalable version of Heroku at another garage startup. But we never heard of them because Heroku beat them to market.

Sure, but that bungee strapped slop got them pretty damn far

My point is that they would have gotten much farther. I'm not talking about architecture astronautics, just a little more thinking-ahead.

As far as the Salesforce acquisition goes, I'd be curious to see who made the decision to put Heroku into maintenance only mode.

I worked for a different part of Salesforce. I don't really feel like Salesforce did a ton of meddling in any of its bigger acquisitions other than maybe Tableau. I think the biggest missed opportunity was potentially creating a more unified experience between all of its subsidiaries. Though, it's hard to call that a failure since they're making tons of money.

It could be a case of post-founder leadership seeing that there's not a lot of room for growth and giving up. That happens a lot in the tech industry.


You'll need to unlock your iPhone first. Even though you're staring at the screen and just asked me to do something, and you saw the unlocked icon at the top of your screen before/while triggering me, please continue staring at this message for at least 5 seconds before I actually attempt FaceID to unlock your phone to do what you asked.

This is largely because LISTEN/NOTIFY has an implementation which uses a global lock. At high volume this obviously breaks down: https://www.recall.ai/blog/postgres-listen-notify-does-not-s...

None of that means Oban or similar queues don't/can't scale—it just means a high volume of NOTIFY doesn't scale, hence the alternative notifiers and the fact that most of its job processing doesn't depend on notifications at all.

There are other reasons Oban recommends a different notifier per the doc link above:

> That keeps notifications out of the db, reduces total queries, and allows larger messages, with the tradeoff that notifications from within a database transaction may be sent even if the transaction is rolled back


> None of that means Oban or similar queues don't/can't scale—it just means a high volume of NOTIFY doesn't scale

Given the context of this post, it really does mean the same thing though?


No, I don't think so. Oban does not rely on a large volume of NOTIFY in order to process a large volume of jobs. The insert notifications are simply a latency optimization for lower volume environments, and for inserts can be fully disabled such that they're mainly used for control flow (canceling jobs, pausing queues, etc) and gossip among workers.

River for example also uses LISTEN/NOTIFY for some stuff, but we definitely do not emit a NOTIFY for every single job that's inserted; instead there's a debouncing setup where each client notifies at most once per fetch period, and you don't need notifications at all in order to process with extremely high throughput.

In short, the fact that high volume NOTIFY is a bottleneck does not mean these systems cannot scale, because they do not rely on a high volume of NOTIFY or even require it at all.


Does River without any extra configuration run into scaling issues at a certain point? If the answer is yes, then River doesn’t scale without optimization (Redis/Clustering in Oban’s case).

While the root cause might not be River/Oban, them not being scalable still holds true. It’s of extra importance given the context of this post is moving away from redis and to strictly a database for a queue system.


Yup. I wasn’t talking about notify in particular but about using Postgres in general.


Yeah, River generally recommends this pattern as well (River co-author here :)

To get the benefits of transactional enqueueing you generally need to commit the jobs transactionally with other database changes. https://riverqueue.com/docs/transactional-enqueueing

It does not scale forever, and as you grow in throughput and job table size you will probably need to do some tuning to keep things running smoothly. But after the amount of time I've spent in my career tracking down those numerous distributed systems issues arising from a non-transactional queue, I've come to believe this model is the right starting point for the vast majority of applications. That's especially true given how high the performance ceiling is on newer / more modern job queues and hardware relative to where things were 10+ years ago.

If you are lucky enough to grow into the range of many thousands of jobs per second then you can start thinking about putting in all that extra work to build a robust multi-datastore queueing system, or even just move specific high-volume jobs into a dedicated system. Most apps will never hit this point, but if you do you'll have deferred a ton of complexity and pain until it's truly justified.


state machines to the rescue, ie i think the nature of asynchronous processing requires that we design for good/safe intermediate states.


I get the temptation to attribute the popularity of these systems to lazy police with nothing better to do, but from personal experience there’s more to it.

I live in a medium sized residential development about 15 minutes outside Austin. A few years ago we started getting multiple incidents per month of attempted car theft where the thieves would go driveway to driveway checking for unlocked doors. Sometimes the resident footage revealed the thieves were armed while doing so. In a couple of cases they did actually steal a car.

The sheriffs couldn’t really do much about it because a) it was happening to most of the neighborhoods around us, b) the timing was unpredictable, and c) the manpower required to camp out to attempt to catch these in progress would be pretty high.

Our neighborhood installed Flock cameras at the sole entrance in response to growing resident concerns. We also put in a strict policy around access control by non law enforcement. In the ~two years since they were installed, we’ve had two or three incidents total whereas immediately prior it was at least as many each month. And in those cases the sheriffs could easily figure out which vehicles had entered or left during that time. I continue to see stories of attempted car thefts from adjacent neighborhoods several times per month.

I totally get the privacy concerns around this and am inherently suspicious of any new surveillance. I also get the reflexive dismissal of their value. In this case it has been a clear win for our community through the obvious deterrent factor and the much higher likelihood of having evidence if anything does happen.

Our Flock cameras do not show on the map here, btw.


Aaron did RT the post here which likely indicates some agreement with the sentiment in it: https://x.com/searls/status/1972293469193351558

Also he shared it directly while saying it was good here: https://x.com/tenderlove/status/1972370330892321197


Take my money! I have been looking for a good way to get Claude to stop telling me I'm right in every damn reply. There must be people who actually enjoy this "personality" but I'm sure not one of them.


As one of River's creators, I'm aware of many companies using it, including names you've heard of :) One cool open source project I've seen recently is this open source pager / on-call system from Target called GoAlert: https://github.com/target/goalert


Modern Postgres in particular can take you really far with this mindset. There are tons of use cases where you can use it for pretty much everything, including as a fairly high throughput transactional job queue, and may not outgrow that setup for years if ever. Meanwhile, features are easier to develop, ops are simpler, and you’re not going to risk wasting lots of time debugging and fixing common distributed systems edge cases from having multiple primary datastores.

If you really do outgrow it, only then do you have to pay the cost to move parts of your system to something more specialized. Hopefully by then you’ve achieved enough success & traction to justify doing so.

Should be the default mindset for any new project if you don’t have very demanding performance & throughput needs, IMO.


PostgreSQL is quite terrible at OLAP, though. We got a few orders of magnitude performance improvement in some aggregation queries by rewriting them with ClickHouse. It's incredible at it.

My rule of thumb is: PG for transactional data consistency, Clickhouse for OLAP. Maybe Elasticsearch if a full-text search is really needed.


This. Don't loose your time and sanity trying to optimize complex queries for pg's non deterministic query planner, you have no guarantee your indexes will be used (even running the same query again with different arguments). Push your data to clickhouse and enjoy good performance without even attempting to optimize. If even more performance is needed, denormalize here and there.

Keep postgres as the source of truth.


I find that postgres query planner is quite satisfactory for very difficult use cases. I was able to get 5 years into a startup that wasn't basically trying to be the next Twitter on a 300 dollar postgres tier with heroku. The reduced complexity was so huge we didn't need a team of 10. The cost savings were yuge and I got really good at debugging slow queries to a point of I could tell when postgres would cough at one.

My point isn't that this will scale. It's that you can get really really far without complexity and then tack on as needed. This is just another bit of complexity removal for early tech. I'd use this in a heart beat.


It’s simple high throughput queries that often bite you with the PG query planner.


Redis, PG, Clickhouse sounds like a good combo to scale workloads that require OLTP and OLAP workloads, at scale (?).


You can get pretty high job throughput while maintaining transactional integrity, but maybe not with Ruby and ActiveRecord :) https://riverqueue.com/docs/benchmarks

That River example has a MacBook Air doing about 2x the throughput as the Sidekiq benchmarks while still using Postgres via Go.

Can’t find any indication of whether those Sidekiq benchmarks used Postgres or MySQL/Maria, that may be a difference.


this is really cool and I'd love to use it, but it seems they only support workers written in Go if I'm not mistaken? My workers can be remote and not using Go, behind a NAT too. I want a them to periodically pull the queue, this way I do not need to worry about network topology. I guess I could simply interface with PG atomically via a simple API endpoint for the workers to connect, but I'd love to have the UI of riverqueue.


very interesting and TIL about the project, thanks for sharing. What DB is River Queue for 46k jobs/sec (https://riverqueue.com/docs/benchmarks)

UPDATE: I see its PG - https://riverqueue.com/docs/transactional-enqueueing


IIUC - Does this mean if I am making a network call, or performing some expensive task from within the job, the transaction against the database is held open for the duration of the job?


Definitely not! Jobs in River are enqueued and fetched by worker clients transactionally, but the jobs themselves execute outside a transaction. I’m guessing you’re aware of the risks of holding open long transactions in Postgres, and we definitely didn’t want to limit users to short-lived background jobs.

There is a super handy transactional completion API that lets you put some or all of a job in a transaction if you want to. Works great for making other database side effects atomic with the job’s completion. https://riverqueue.com/docs/transactional-job-completion


very nice! cool project


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: