You're only thinking about the _input_. Technically, yes, I can host an express app on lambda just like I could by other means, but the problem is that it can't really _do_ anything. Unless you're performing a larger job or something you probably need to read/write data from somewhere and connecting to a normal database is too slow for most use-cases.
Connecting to AWS managed services (s3, kinesis, dynamodb, sns) don't have this overhead so you can actually perform some task that involves reading/writing data.
Lambda is basically just glue code to connect AWS services together. It's not a general purpose platform. Think "IFTTT for AWS"
We have been connecting to mongo db from without lambda for the past year and sure you don't get single digit latency but r/w data happens under 30ms in most cases, we even use paramastore to pull all secrets and it's still without that time frame.
You can run your Lambda function within the same network as your other servers. It just appears as a new IP address inside your network and can call whatever you permit it to.
he probably meant 'within' instead of 'without'.
myself to use aws-serverless-express and connect my lambda hosted in us-east-virginia to a mlab (now Mongo Atlas) mongo database hosted in the same us-east-virginia amazon region
But that is the whole point of using cloud services that are tightly integrated with each other. I can not do it as efficiently as Amazon myself can not be called "propriety lock-in".
Said efficiencies are not due to Amazon, just that the services are colocated in the same facility.
If I put the node service and a database on the same box I'd get the same performance, and actually probably better since Amazon would still have them on separate physical hardware.
The infrastructure, or interfaces is where the lock-in comes in. Each non-portable interface adds another binding, so it's not as easy as swapping out the provider as the OP pointed out, once you've been absorbed into the ecosystem of non-portable interfaces. You have to abstract each service out to be able to swap out providers.
If you use open source interfaces, or even proprietary interfaces that are portable, it's easier to take your app with you to the new hosting provider.
The non-portable interfaces are crux of the matter. If you could run lambda on Google, Azure or your own metal, folks wouldn't feel so locked-in.
As I said. I can run the Node/Express lambda anywhere without changing code.
But, I could still take advantage of hosted Aurora (MySQL/Postgres), DocumentDB (Mongo), ElasticCache (Memcached/Redis) or even Redshift (Postgres compatible interface) without any of the dreaded “lock-in”.
It sounds like you have a preference for choosing portable interfaces when it comes to storage. And you've abstracted out the non-portable lambda interface.
My position isn't don't use AWS as a hosting provider, it's that you ought to avoid being locked into a proprietary non-portable interface when possible.
I don't really see cloud-provider competition lessening or hardware getting more expensive and less efficient or the VMs getting worse at micro-slicing in the next 5 years. So why would I be worried about rising costs?
I think spending one of the newly-raised millions over a year or so can help there, including hiring senior engineers talented enough to fix the shitty architecture that got you to product-market-fit. This isn’t an inherently bad thing, it just makes certain business strategies incompatible with certain engineering strategies. Luckily for startups, most intermediate engineers can get you to PMF if you keep them from employing too much abstraction.
Isn’t employing too many abstractions just what many here are advocating - putting a layer of abstraction over the SDKs abstractions of the API? I would much rather come into a code base that just uses Python + Boto3 (AWS’s Python SDK) than one that uses Python + “SQSManager”/“S3Manager” + Boto3.
That is indeed what many here are advocating. There are only so many possible interfaces or implementations, and usually abstracting over one or the other is an effort in reinventing the wheel, or the saw, or the water clock, and not doing the job as well as some standard parts glued together until quite far into the endeavor.
Stop scare-quoting "lock-in". Lock-in means development effort to get out of a system, regardless of how trivial you think it is.
If writing code to be able to move to a different cloud isn't considered lock-in, then nothing is since anyone can write code to do anything themselves.
Lock in is an economic concept, it’s not just about code but about “switching costs”. Ecosystem benefits, data gravity etc all come into play.
There are two kinds of lock-in: high cost because no competitor does as a good a job - this is good lock-in, and trying to avoid this just means you’re not building the thing optimally in the first place.
There is also high switching cost because of unique interface and implementation requirmenrs that don’t add any value over a more interopable standard. This is the kind that’s worth avoiding if you can.
"Connecting to AWS managed services (s3, kinesis, dynamodb, sns) don't have this overhead so you can actually perform some task that involves reading/writing data."
That is due to network and colocation efficiencies. The overhead of managing such services yourself is another matter.
Not just the network overhead, the maintenance and setup overhead. I can spin up an entire full stack in multiple accounts just by creating a CloudFormation template.
I’ve done stress testing by spinning up and tearing down multiple VMs played with different size databases, autoscaled read replicas for performance. Ran a spot fleet, etc.
When you need things now you don’t have time to requisition hardware and get it sent to your colo.
And then you still have more stuff to manage now based on the slim chance that one day years down the road you might rip your entire multi Az redundant infrastructure, your databases, etc with all of the read replicas to another provider....
And this doesn’t count all of the third party hosted services.
Aurora (Mysql) redundantly writes your data to six different storage devices across multiple availability zones. The read replicas read from the same disks. As soon as you bring up a read replica, the data is already there. You can’t do that with a standard Mysql read replica.
OK. So you connect to Postgres on RDS - cloud agnostic.
You connect to S3, and:
a) You can build an abstraction service if you care about vendor lock-in so much
b) It has an API that plenty of open source projects are compatible with (I believe Google's storage is compatible as well)
Maybe you use something like SQS or SNS. Bummer, those are gonna "lock you in". But I've personally migrated between queueing solutions before and it shouldn't be a big deal to do so.
It's really easy to avoid lockin, lambda really doesn't make it any harder than EC2 at all.
As long as you write your own wrappers to the SDKs changing cloud providers is definitely doable. We started full AWS stack with Lambda but have now been slowly refactoring our way into more cloud-provider agnostic direction. It's definitely not an existential threat level lock-in. Serverless technology is only starting out still and I'm pretty sure 5 years from now Lambda won't be the go-to platform anyway. Plus honestly we've learned so much from the first big project on Lambda that writing the next one with all of that in mind will be pretty great (and agnostic).
I don't believe that writing wrappers is particularly important, though I think that anyone who uses SQS is likely to build an abstraction over it at some point (as with all lower level communication protocols, at some point you build a "client library" that's more specific).
As I said, at least in the cases of your database and your storage, being cloud-agnostic is trivial. Managed postgres is easy to migrate from, S3 shouldn't be hard to migrate from either.
Certainly lambda doesn't impact this too much.
> Serverless technology is only starting out still and I'm pretty sure 5 years from now Lambda won't be the go-to platform anyway. Plus honestly we've learned so much from the first big project on Lambda that writing the next one with all of that in mind will be pretty great (and agnostic).
I realize it isn't entirely on-topic, but could you elaborate? I'm curious to hear more about your opinion on this, I'm not sure what the future of Serverless is.
And that goes back to developers using the repository pattern because one day the CTO might decide that they want to get rid of their 6-7 figure Oracle installation and migrate to Postgres. There is a lot more to migrating infrastructure at scale than writing a few facades.
Heck, consultants get paid lots of money just to do a lift and shift and migrate a bunch of VMWare images from on prem to AWS.
a) You can build an abstraction service if you care about vendor lock-in so much
...
It's really easy to avoid lockin, lambda really doesn't make it any harder than EC2 at all.
Yes, you can build an abstraction layer. And maintain it. And hope that you don't get feature divergence underneath it.
Have you ever asked the business folks or your investors did they care about your “levels of abstraction”? What looks better on your review? I created a facade over our messaging system or I implemented this feature that brought in revenue/increased customer retention/got us closer to the upper right quadrant of Gartner’s magic square?
Why should they care, or even be in the loop for such a decision?
You don’t ask your real estate agent on advice for fixing you electrical system I guess?
Of course your business folks care whether you are spending time adding business value and helping them make money.
I’ve had to explain to a CTO before why I had my team spending time on a CI/CD pipeline. Even now that I have a CTO whose idea of “writing requirements” is throwing together a Python proof of concept script and playing with Athena (writing Sql against a large CSV file stored in S3), I still better be able to articulate business value for any technological tangents I am going on.
Sure. Agree totally, maybe I misread your previous comment a bit.
What I meant is that run-of-the-mill business folks do not necessarily know how business value is created in terms of code and architecture.
I don't know of any business where they wouldn't be involved. Not in the "Let's talk directly about implementation details" way, but in the "Driving product development and roadmap" and "ascertaining value to our customers" way.
Any time spent on work that doesn't directly create value for customers is work that the business should be weighing in on. I'm not saying that you should never spend any time doing anything else - but these are trade offs that the product manager should be involved in, and one of their primary jobs is being able to weight the technical and business realities and figuring out where resources should be going.
> and that it requires virtually no effort to avoid it
Of course it requires effort. A lot of effort, not to mention headcount. The entire value of cloud-managed services is what it saves you vs. the trade-off's, and it's disingenuous to pretend that's not the case.
Sorry, I don't agree, and I feel like I provided evidence why in my first post. To summarize, choosing services like Postgres and S3 doesn't lock you in. SQS and SNS might, but I think it's an exaggerated cost, and that has nothing to do with Lambdas (EC2 instances are just as likely to use SQS or SNS - moreso, given that SQS wasn't supported for Lambdas until recently).
There are tradeoffs, of course. Cost at scale is the really big one - at some point it's cheaper to bring ops/ hardware in-house.
I just don't agree that lock-in is a huge issue, and I really disagree with the idea that lambdas make lock-in harder.
There's a big difference between AWS RDS and self-managed. Huge difference.
- DBA's & DevOps
- Procurement management & spare parts
- Colocation w/multihoming
- Leasing agreements
- Paying for power usage
- Disaster recovery plan
- CapEx & depreciation
- Uncomfortable meetings with my CFO explaining why things are expensive
- Hardware failure
- Scaling up/out
Not even worth going on because the point is obvious. Going "all in" reduces cost and allows more time to be focused on revenue-generating work. The "migration" boogeyman is just that, something we tell other programmers to scare them around the campfire. You're going to be hard-pressed finding horror stories of companies in "cloud lock-in" that isn't a consultant trying to sell you something.
> at some point it's cheaper to bring ops/ hardware in-house.
It depends. It's not always scale issue, and with all things it starts with a model and collaboration with your finance team.
While I could probably answer that, I don't think it's relevant to my central point - that lock-in is not as big of a deal as it's portrayed as, and that lambdas do not make the problem considerably worse.
That’s an incredibly ignorant and misleading statement. It’s sort of like saying a database isn’t valuable because 99.999% of requests hit the cache, and not the disk.
Everything was built on Amazon and video is largely hosted on S3. Yes, there’s a large CDN in the mix too. That doesn’t take away from the achievement.
Well, what do you think Netflix is doing to be AWS’s largest customer? Have you seen any of their presentations on YouTube from AWS reinvent? Where do you think they encode the videos? Handle sign ins, etc?
That’s just the CDN. Netflix is still by far AWS’s biggest customer and its compute is still on AWS. I don’t think most companies are going to be setting up colos at ISPs around the world.
Our Lambda deployments handle REST API Gateway calls, SQS events, and Step functions. Basically the entire middleware of a classic 3-tier stack.
Except for some proprietary light AWS proxy code, the bulk of the Lambdas delegate to pre-existing Java POJO classes.
The cold start issues and VPC configuration were a painful learning curve, but nothing I would consider proprietary to AWS. Those are universal deployment tasks.
> Unless you're performing a larger job or something you probably need to read/write data from somewhere and connecting to a normal database is too slow for most use-cases.
This is false. I've seen entire Lambda APIs backed by MySQL on large, consumer-facing apps and websites. As another poster pointed out, the cold-start-in-a-VPC is a major PITA, but it can (mostly) be worked around.
And there is always DynamoDB where you aren’t in a VPC and Serverless Aurora where you don’t have to worry about the typical database connections and you can use the http based Data APIs.
How is the Aurora Serverless Data API now? On preview release it was a bit sketchy: horrible latencies (pretty much ruining any benefit you could get from avoiding the in-VPC cold start) and a dangerous sql-query-as-a-string API (no prepared statements or placeholders for query params that would get automatically escaped IIRC).
Unfortunately, we require the ability to load/unload directly from S3 and Aurora serverless doesn’t support that. We haven’t been able to do anymore than a POC.
Dynamo is really the hardest lock-in in the ecosystem for me. Serverless Aurora is still easy to kill with too much concurrency/bad connection management compared to Dynamo
When lambdas haven't been hit for 15mins the first hit after has a noticeably longer start time. It's due to deprovisioning/reprovisioning underlying resources like network interfaces. Some people do crazy stuff like a scheduled task to hit their own service to combat this so AWS promised to solve it.
Even if you invoke your lambda function to warm it up in anticipation of traffic, you'll still hit cold starts if the lambda needs to scale out; the new machines are exposed to inputs "cold." Those crazy patterns trying to warm the lambda up are really crazy if you think about it because no one is using them is really aware of the underlying process(es) involved.
"Why are you throwing rocks at that machine?"
"It makes it respond more quickly to actual client requests. Sometimes."
"Sometimes?"
"Well, most the time."
"Why's that? What's causing the initial latency?"
"Cold starts."
"Yeah, but what's that mean?"
"The machine is booting or caching runtime data or, you know, warming up and serverless. Anyway, don't think about it too much, just trust me on this rock thing. Speaking of which, I got to get back to bastardizing our core logic. Eddie had great results with his new exponential backup rock toss routine and I'm thinking of combining that with a graphql random parameter generator library that Ted said just went alpha this afternoon."
In addition to what the sibling reply said. There is also the issue of your choice of runtimes. Java has the slowest startup time and Go the fastest. C# is faster than Java but still slower than Python and Node.
Clever uses of dependency injection without reflection (think dagger not spring) and reducing code size as much as possible can give you fairly fast cold start times with a Java execution environment. Bloated classpaths and reflection filled code will destroy your initialization time.
Connecting to AWS managed services (s3, kinesis, dynamodb, sns) don't have this overhead so you can actually perform some task that involves reading/writing data.
Lambda is basically just glue code to connect AWS services together. It's not a general purpose platform. Think "IFTTT for AWS"