You're only thinking about the _input_. Technically, yes, I can host an express app on lambda just like I could by other means, but the problem is that it can't really _do_ anything. Unless you're performing a larger job or something you probably need to read/write data from somewhere and connecting to a normal database is too slow for most use-cases.
Connecting to AWS managed services (s3, kinesis, dynamodb, sns) don't have this overhead so you can actually perform some task that involves reading/writing data.
Lambda is basically just glue code to connect AWS services together. It's not a general purpose platform. Think "IFTTT for AWS"
We have been connecting to mongo db from without lambda for the past year and sure you don't get single digit latency but r/w data happens under 30ms in most cases, we even use paramastore to pull all secrets and it's still without that time frame.
You can run your Lambda function within the same network as your other servers. It just appears as a new IP address inside your network and can call whatever you permit it to.
he probably meant 'within' instead of 'without'.
myself to use aws-serverless-express and connect my lambda hosted in us-east-virginia to a mlab (now Mongo Atlas) mongo database hosted in the same us-east-virginia amazon region
But that is the whole point of using cloud services that are tightly integrated with each other. I can not do it as efficiently as Amazon myself can not be called "propriety lock-in".
Said efficiencies are not due to Amazon, just that the services are colocated in the same facility.
If I put the node service and a database on the same box I'd get the same performance, and actually probably better since Amazon would still have them on separate physical hardware.
The infrastructure, or interfaces is where the lock-in comes in. Each non-portable interface adds another binding, so it's not as easy as swapping out the provider as the OP pointed out, once you've been absorbed into the ecosystem of non-portable interfaces. You have to abstract each service out to be able to swap out providers.
If you use open source interfaces, or even proprietary interfaces that are portable, it's easier to take your app with you to the new hosting provider.
The non-portable interfaces are crux of the matter. If you could run lambda on Google, Azure or your own metal, folks wouldn't feel so locked-in.
As I said. I can run the Node/Express lambda anywhere without changing code.
But, I could still take advantage of hosted Aurora (MySQL/Postgres), DocumentDB (Mongo), ElasticCache (Memcached/Redis) or even Redshift (Postgres compatible interface) without any of the dreaded “lock-in”.
It sounds like you have a preference for choosing portable interfaces when it comes to storage. And you've abstracted out the non-portable lambda interface.
My position isn't don't use AWS as a hosting provider, it's that you ought to avoid being locked into a proprietary non-portable interface when possible.
I don't really see cloud-provider competition lessening or hardware getting more expensive and less efficient or the VMs getting worse at micro-slicing in the next 5 years. So why would I be worried about rising costs?
I think spending one of the newly-raised millions over a year or so can help there, including hiring senior engineers talented enough to fix the shitty architecture that got you to product-market-fit. This isn’t an inherently bad thing, it just makes certain business strategies incompatible with certain engineering strategies. Luckily for startups, most intermediate engineers can get you to PMF if you keep them from employing too much abstraction.
Isn’t employing too many abstractions just what many here are advocating - putting a layer of abstraction over the SDKs abstractions of the API? I would much rather come into a code base that just uses Python + Boto3 (AWS’s Python SDK) than one that uses Python + “SQSManager”/“S3Manager” + Boto3.
That is indeed what many here are advocating. There are only so many possible interfaces or implementations, and usually abstracting over one or the other is an effort in reinventing the wheel, or the saw, or the water clock, and not doing the job as well as some standard parts glued together until quite far into the endeavor.
Stop scare-quoting "lock-in". Lock-in means development effort to get out of a system, regardless of how trivial you think it is.
If writing code to be able to move to a different cloud isn't considered lock-in, then nothing is since anyone can write code to do anything themselves.
Lock in is an economic concept, it’s not just about code but about “switching costs”. Ecosystem benefits, data gravity etc all come into play.
There are two kinds of lock-in: high cost because no competitor does as a good a job - this is good lock-in, and trying to avoid this just means you’re not building the thing optimally in the first place.
There is also high switching cost because of unique interface and implementation requirmenrs that don’t add any value over a more interopable standard. This is the kind that’s worth avoiding if you can.
"Connecting to AWS managed services (s3, kinesis, dynamodb, sns) don't have this overhead so you can actually perform some task that involves reading/writing data."
That is due to network and colocation efficiencies. The overhead of managing such services yourself is another matter.
Not just the network overhead, the maintenance and setup overhead. I can spin up an entire full stack in multiple accounts just by creating a CloudFormation template.
I’ve done stress testing by spinning up and tearing down multiple VMs played with different size databases, autoscaled read replicas for performance. Ran a spot fleet, etc.
When you need things now you don’t have time to requisition hardware and get it sent to your colo.
And then you still have more stuff to manage now based on the slim chance that one day years down the road you might rip your entire multi Az redundant infrastructure, your databases, etc with all of the read replicas to another provider....
And this doesn’t count all of the third party hosted services.
Aurora (Mysql) redundantly writes your data to six different storage devices across multiple availability zones. The read replicas read from the same disks. As soon as you bring up a read replica, the data is already there. You can’t do that with a standard Mysql read replica.
OK. So you connect to Postgres on RDS - cloud agnostic.
You connect to S3, and:
a) You can build an abstraction service if you care about vendor lock-in so much
b) It has an API that plenty of open source projects are compatible with (I believe Google's storage is compatible as well)
Maybe you use something like SQS or SNS. Bummer, those are gonna "lock you in". But I've personally migrated between queueing solutions before and it shouldn't be a big deal to do so.
It's really easy to avoid lockin, lambda really doesn't make it any harder than EC2 at all.
As long as you write your own wrappers to the SDKs changing cloud providers is definitely doable. We started full AWS stack with Lambda but have now been slowly refactoring our way into more cloud-provider agnostic direction. It's definitely not an existential threat level lock-in. Serverless technology is only starting out still and I'm pretty sure 5 years from now Lambda won't be the go-to platform anyway. Plus honestly we've learned so much from the first big project on Lambda that writing the next one with all of that in mind will be pretty great (and agnostic).
I don't believe that writing wrappers is particularly important, though I think that anyone who uses SQS is likely to build an abstraction over it at some point (as with all lower level communication protocols, at some point you build a "client library" that's more specific).
As I said, at least in the cases of your database and your storage, being cloud-agnostic is trivial. Managed postgres is easy to migrate from, S3 shouldn't be hard to migrate from either.
Certainly lambda doesn't impact this too much.
> Serverless technology is only starting out still and I'm pretty sure 5 years from now Lambda won't be the go-to platform anyway. Plus honestly we've learned so much from the first big project on Lambda that writing the next one with all of that in mind will be pretty great (and agnostic).
I realize it isn't entirely on-topic, but could you elaborate? I'm curious to hear more about your opinion on this, I'm not sure what the future of Serverless is.
And that goes back to developers using the repository pattern because one day the CTO might decide that they want to get rid of their 6-7 figure Oracle installation and migrate to Postgres. There is a lot more to migrating infrastructure at scale than writing a few facades.
Heck, consultants get paid lots of money just to do a lift and shift and migrate a bunch of VMWare images from on prem to AWS.
a) You can build an abstraction service if you care about vendor lock-in so much
...
It's really easy to avoid lockin, lambda really doesn't make it any harder than EC2 at all.
Yes, you can build an abstraction layer. And maintain it. And hope that you don't get feature divergence underneath it.
Have you ever asked the business folks or your investors did they care about your “levels of abstraction”? What looks better on your review? I created a facade over our messaging system or I implemented this feature that brought in revenue/increased customer retention/got us closer to the upper right quadrant of Gartner’s magic square?
Why should they care, or even be in the loop for such a decision?
You don’t ask your real estate agent on advice for fixing you electrical system I guess?
Of course your business folks care whether you are spending time adding business value and helping them make money.
I’ve had to explain to a CTO before why I had my team spending time on a CI/CD pipeline. Even now that I have a CTO whose idea of “writing requirements” is throwing together a Python proof of concept script and playing with Athena (writing Sql against a large CSV file stored in S3), I still better be able to articulate business value for any technological tangents I am going on.
Sure. Agree totally, maybe I misread your previous comment a bit.
What I meant is that run-of-the-mill business folks do not necessarily know how business value is created in terms of code and architecture.
I don't know of any business where they wouldn't be involved. Not in the "Let's talk directly about implementation details" way, but in the "Driving product development and roadmap" and "ascertaining value to our customers" way.
Any time spent on work that doesn't directly create value for customers is work that the business should be weighing in on. I'm not saying that you should never spend any time doing anything else - but these are trade offs that the product manager should be involved in, and one of their primary jobs is being able to weight the technical and business realities and figuring out where resources should be going.
> and that it requires virtually no effort to avoid it
Of course it requires effort. A lot of effort, not to mention headcount. The entire value of cloud-managed services is what it saves you vs. the trade-off's, and it's disingenuous to pretend that's not the case.
Sorry, I don't agree, and I feel like I provided evidence why in my first post. To summarize, choosing services like Postgres and S3 doesn't lock you in. SQS and SNS might, but I think it's an exaggerated cost, and that has nothing to do with Lambdas (EC2 instances are just as likely to use SQS or SNS - moreso, given that SQS wasn't supported for Lambdas until recently).
There are tradeoffs, of course. Cost at scale is the really big one - at some point it's cheaper to bring ops/ hardware in-house.
I just don't agree that lock-in is a huge issue, and I really disagree with the idea that lambdas make lock-in harder.
There's a big difference between AWS RDS and self-managed. Huge difference.
- DBA's & DevOps
- Procurement management & spare parts
- Colocation w/multihoming
- Leasing agreements
- Paying for power usage
- Disaster recovery plan
- CapEx & depreciation
- Uncomfortable meetings with my CFO explaining why things are expensive
- Hardware failure
- Scaling up/out
Not even worth going on because the point is obvious. Going "all in" reduces cost and allows more time to be focused on revenue-generating work. The "migration" boogeyman is just that, something we tell other programmers to scare them around the campfire. You're going to be hard-pressed finding horror stories of companies in "cloud lock-in" that isn't a consultant trying to sell you something.
> at some point it's cheaper to bring ops/ hardware in-house.
It depends. It's not always scale issue, and with all things it starts with a model and collaboration with your finance team.
While I could probably answer that, I don't think it's relevant to my central point - that lock-in is not as big of a deal as it's portrayed as, and that lambdas do not make the problem considerably worse.
That’s an incredibly ignorant and misleading statement. It’s sort of like saying a database isn’t valuable because 99.999% of requests hit the cache, and not the disk.
Everything was built on Amazon and video is largely hosted on S3. Yes, there’s a large CDN in the mix too. That doesn’t take away from the achievement.
Well, what do you think Netflix is doing to be AWS’s largest customer? Have you seen any of their presentations on YouTube from AWS reinvent? Where do you think they encode the videos? Handle sign ins, etc?
That’s just the CDN. Netflix is still by far AWS’s biggest customer and its compute is still on AWS. I don’t think most companies are going to be setting up colos at ISPs around the world.
Our Lambda deployments handle REST API Gateway calls, SQS events, and Step functions. Basically the entire middleware of a classic 3-tier stack.
Except for some proprietary light AWS proxy code, the bulk of the Lambdas delegate to pre-existing Java POJO classes.
The cold start issues and VPC configuration were a painful learning curve, but nothing I would consider proprietary to AWS. Those are universal deployment tasks.
> Unless you're performing a larger job or something you probably need to read/write data from somewhere and connecting to a normal database is too slow for most use-cases.
This is false. I've seen entire Lambda APIs backed by MySQL on large, consumer-facing apps and websites. As another poster pointed out, the cold-start-in-a-VPC is a major PITA, but it can (mostly) be worked around.
And there is always DynamoDB where you aren’t in a VPC and Serverless Aurora where you don’t have to worry about the typical database connections and you can use the http based Data APIs.
How is the Aurora Serverless Data API now? On preview release it was a bit sketchy: horrible latencies (pretty much ruining any benefit you could get from avoiding the in-VPC cold start) and a dangerous sql-query-as-a-string API (no prepared statements or placeholders for query params that would get automatically escaped IIRC).
Unfortunately, we require the ability to load/unload directly from S3 and Aurora serverless doesn’t support that. We haven’t been able to do anymore than a POC.
Dynamo is really the hardest lock-in in the ecosystem for me. Serverless Aurora is still easy to kill with too much concurrency/bad connection management compared to Dynamo
When lambdas haven't been hit for 15mins the first hit after has a noticeably longer start time. It's due to deprovisioning/reprovisioning underlying resources like network interfaces. Some people do crazy stuff like a scheduled task to hit their own service to combat this so AWS promised to solve it.
Even if you invoke your lambda function to warm it up in anticipation of traffic, you'll still hit cold starts if the lambda needs to scale out; the new machines are exposed to inputs "cold." Those crazy patterns trying to warm the lambda up are really crazy if you think about it because no one is using them is really aware of the underlying process(es) involved.
"Why are you throwing rocks at that machine?"
"It makes it respond more quickly to actual client requests. Sometimes."
"Sometimes?"
"Well, most the time."
"Why's that? What's causing the initial latency?"
"Cold starts."
"Yeah, but what's that mean?"
"The machine is booting or caching runtime data or, you know, warming up and serverless. Anyway, don't think about it too much, just trust me on this rock thing. Speaking of which, I got to get back to bastardizing our core logic. Eddie had great results with his new exponential backup rock toss routine and I'm thinking of combining that with a graphql random parameter generator library that Ted said just went alpha this afternoon."
In addition to what the sibling reply said. There is also the issue of your choice of runtimes. Java has the slowest startup time and Go the fastest. C# is faster than Java but still slower than Python and Node.
Clever uses of dependency injection without reflection (think dagger not spring) and reducing code size as much as possible can give you fairly fast cold start times with a Java execution environment. Bloated classpaths and reflection filled code will destroy your initialization time.
I think it probably isn't purely the use of lambda/serverless but the creep of other things that make it more difficult to leave. Stuff like cognito or SNS or other services. Once you start using AWS, each individual choice in isolation looks cheaper and easier to just use something AWS provides. Once you start utilizing 7+ different AWS services it becomes very expensive to transition to another platform.
Also, this conversation is very different if you are an early stage company racing to market as opposed to an established organization where P&L often takes precedence over R&D.
At my previous place I saw this in reverse. Over the previous years we had invested in building up our own email infrastructure.
Because so much had been invested (sunk cost fallacy), no-one could really get their heads around a shift to SES, even though it would have been a slam dunk in improved reliability and developer productivity.
Whereas if we were on, say, Mailgun, and then someone wanted a shift to SES, that debate could probably have been a more rational one.
I just point this out to say that investing in your own infrastructure can be a very real form of lock-in itself.
Other way around, you'd be surprised at how much infrastructure you can buy by forgoing AWS's offerings (for large work scales.)
For small companies, you may not be able to afford infrastructure people, and moving fast makes way more sense. There's little point in paying for an ops person when you have very little infrastructure.
At a certain scale though, AWS stops being cost effective. You begin to have room in your budget for ops people, you get room to afford datacenter costs, and you can start paying for a cloud architect to fill out internal or hybrid cloud offerings using openshift or openstack.
>Yeah, Netflix's opinion is to use Amazon as little as possible.
This is simply untrue. Everything but their CDN uses AWS.
>Their critical infrastructure (the CDN) is not anywhere near the slimy grip of AWS.
The streaming website and app aren't critical infrastructure? Databases containing all of their business and customer details aren't critical infrastructure? Encoding content so it can be delivered by the CDN isn't critical infrastructure?
That's like saying I don't trust Ford because I buy Michelin tires while I drive a Fusion.
The CDN is in the ISPs data center. You can’t get much lower latency than that. But if Netflix is AWS’s largest customer, I doubt they are using it as little as possible.
Netflix does presentations every year at ReInvent about how they use AWS and they have a ton of open source tooling they wrote specifically tied to AWS.
This is the same model of buying things on amazon too -- once you've bought once, it's easier to buy again. Why spend time going to another shop when you can just use amazon to buy multiple things.
This ease of use philospohy goes way back to the one-click patent. If I want DNS, why wouldn't I go to amazon, which has all my finance details, and a decent interface (and even API), rather than choosing another DNS provider, setting up my credit card, and having to maintain an account with them. So I choose DNS via AWS. Then I want a VPS, but why go to linode and have the overhead of another account when I could do lightsail instead?
And then I can use Amazon Certificate Manager, create a certificate attach it to your load balancer and cloud front distribution and never have to worry about expiring certificates - and they are free.
You don't necessarily have to host this yourself. With this, there's a relatively straightforward (but obviously not necessarily easy) way for Joe Entrepreneur to set up a hosted service to compete with AWS Lambda, thus helping the community avoid vendor lock-in.
So can Joe Entrepreneur also host my databases? my storage? My CDN around the world? My ElasticSearch cluster? My Redis cluster? All of my objects? Can he provide 24 hour support? Can he duplicate my data center for my developers in India so they don’t have lag? What about my identity provider? My Active Directory?
Can he do all that across multiple geographically dispersed redundant data centers?
I think you're missing the point - you may need to move to your own infrastructure for security, privacy, regulatory or accountability reasons. You may encounter new needs which AWS may no longer meet, cheap bandwidth for example (AWS bandwidth is both way overpriced and on some rather poor networks). Or Amazon may decide that the price of all their serivces is now going up by a factor of 5 and if you have your attitude, well, you're stuck paying for it.
Having relatively easy to spin up alternatives is a great thing. I can run my application entirely on a local kubernetes cluster or one on Amazon, DigitalOcean or Google's cloud services. That sort of flexibility is excellent and has allowed us to scale into situations where we otherwise couldn't have affordably done so (being able to buy some bandwidth from Joe Entrepreneur has it's benefits sometimes).
I think you're missing the point - you may need to move to your own infrastructure for security, privacy, regulatory or accountability reasons.
Which compliance regulation require you not to use a cloud provider? At most they may require you to not share a server with another company - that can be done with a drop down - or the data has to be hosted locally - again that can be done by selecting a region.
>Which compliance regulation require you not to use a cloud provider?
The policies that say not to use a company controlled by the US government. Or the ones that say under no circumstances should the data be sent over the Internet to a third party "because OPs are hard".
Which regulation or policy? Which certification? Name names. It’s not any of the financial, legal, or health care compliance regulations that I’m aware of.
In short most of German laws make it incredibly risky (but not forbidden) to use any american company for any kind of data that can be resolved to the underlying person. (Eg. a lot of companies got their warning shot when "safe harbor" exploded, if the same happens with https://www.privacyshield.gov/welcome a world of shit awaits)
Yeah I can see explaining to our business customers who grill us about the reliability of our infrastructure that we host our infrastructure on Digital Ocean....
You realize the databases I’m referring to are hosted versions of MySql/Postgres, the ElasticSearch cluster is just that standard ElasticSearch, and Redis is just that - Redis and you can setup a CDN anywhere?
Even if you chose to use AWS’s OLAP database,Redshift, it uses standard Postgres drivers to connect. You could move it to a standard Postgres installation. You wouldn’t get the performance characteristics of course.
If you don’t want to be “locked in” to DynamoDB, there is always the just announced Mongo compatible DocumentDB. Of course ADFS is used everywhere.
Why in the world would I want to manage a colo with all of those services myself and still not have any geographic redundancy - let alone any place near our outsourced developers?
None of my list is esoteric - a website with a caching layer, a database, a CDN, a search index, and a user pool that can survive a data center outage and some place to store assets.
Seeing that as long as Y combinator has existed, only one company has ever gone public, the rest are still either humming along unprofitably, have been acquired, or gone out of business, the chances of most businesses having to worry about their largest business risk being an over reliance on AWS seems slim.
It’s easy enough to separate your event sources from your event destinations. Anything that can trigger lambda can also trigger SNS message that can call an API endpoint.
If I did want to move from AWS that’s the first thing I would do. Put an API end point in front of my business logic, change the event from triggering lambda to an SNS message and move my API off of AWS. Then slowly migrate everything else.
For example:
Using this template.
https://github.com/awslabs/aws-serverless-express
I’ve been able to deploy the same code as a regular Node/Express app and a lambda with no code changes just by changing my CI/CD Pipeline slightly.
You can do the same with any supported language.
With all of the AWS services we depend on, our APIs are the easiest to transition.
And despite the dreams of techies more than likely after awhile, you aren’t going to change your underlying infrastructure.
You are always locked into your infrastructure choices.