s3rg4fts's comments

s3rg4fts · on Sept 30, 2024

Post 2 of my Kubernetes on Raspberry series. Preparing (bootstrapping) the Raspberry Pis to become Kubernetes nodes.

s3rg4fts · on Aug 26, 2024

Post 1 of my Kubernetes on Raspberry series. Explaining the hardware used for the nodes, the cluster's node configuration and costs.

s3rg4fts · on June 4, 2024

It most definitely is not, you are right! I think the main difference IMO is that by paying that price with k8s I at least have something that resembles an application platform and I can easily ship containers, deploy my app and not deal with hardware so much as I'd have to do in the past deploying to servers.

But, as mentioned, I only see it as a developer. My end product with k8s resembles something that's closer to my development tools than what I'd have maintaining a series of servers and using other tools to deploy apps onto.

s3rg4fts · on June 4, 2024

I do dread that exact scenario you're mentioning. I know it is a possibility but I'm hoping that should this day arrive I'll be able to put it to rest and hopefully have learned stuff along the way.

tombert · on June 4, 2024

You should absolutely give it a try, and maybe you'll have better luck than me, and it's a fun little experiment if nothing else.

If you get frustrated and don't want to do it anymore, you might also look at Docker Swarm. For homelab stuff, I found it considerably easier to work with. I'm not sure how well it would work with hundreds of services, but for a homelab I found it considerably less frustrating to deal with than K8s. It can still be useful to play with distributed systems and it's a bit more intro-friendly in my opinion.

It's something that I think most engineers should try at least once, if nothing else to learn firsthand the annoyances of distributed computing. It's easy to read about these things in textbooks and blog posts and university courses (and you should!), but at least for me these things didn't really sink in until had to fix broken stuff, or deal with weird heartbeat issues, or see things that should go N times faster actually go slower because I didn't take into account latency etc.

s3rg4fts · on June 4, 2024

I think the 3/4s have a lot of limitations indeed. The 5s are a bit more powerful, so I'm expecting your experience would be better.

Wasn't aware of NixOS, looks pretty interesting but I'm not sure about how easy / reliable it'd be to run it on a Pi 5 (https://wiki.nixos.org/wiki/NixOS_on_ARM/Raspberry_Pi_5). I'll be keeping an eye on it though!

As far as Helm vs Ansible, I'm using Ansible to deploy the basics (bootstrap control plane & worker nodes, networks plugin) and then everything is deployed with IaC (Pulumi) which installs Helm releases.

Kerbonut · on June 4, 2024

I built mine before the 5 was released. I ended up running Ubuntu server on the nodes and configure them all using ansible playbooks (installing tailscale, k3s, updates, OS tweaks, etc.). I started looking at helm but there is so much inconsistency using community helm charts. I think writing my own would have been a better approach instead of templatizing my manifests and playbooks and doing it that way, however it is very easy to stand up a new service (assuming it's only 1-2 pods). If I end up DRY'ing my deployments it could end up being not too bad as a distinct deployment method from IaC or helm.

How do you like Pulumi? It seems similar to AWS CDK...

s3rg4fts · on June 4, 2024

If you're templatizing manifests, kubespray does this pretty well I think. At least for the basics, it's pretty helpful so far. But indeed, I'm looking into deploying more things with Helm if possible.

Most services I've been using so far offer official Helm charts. But I get your point, it can be cumbersome and if there isn't an official one, then they can be pretty undocumented / hard to work around.

I haven't used CDK, but the concept is definitely similar. I think Pulumi most likely has wider support, since it's based on Terraform and even if you don't have a provider available on Pulumi you can "port it" (although never tried it, not sure if it works well). I like how it stores the state for you and secrets as well, saves quite a bit of trouble.

s3rg4fts · on June 4, 2024

Wasn't aware of HomelabOS, looks pretty interesting!

How is your Longhorn performing? I tried setting it up on my nodes but with Gigabit networking and possibly the Pis pretty average CPUs I would get pretty awful performance both in distributed volumes (with replicas etc) but also on strict-local ones (for reasons I haven't yet figured out).

I am now considering using something like https://github.com/rancher/local-path-provisioner, since I mainly intend to use Longhorn for DBs that handle fault tolerance / backups etc on their own.

psini · on June 5, 2024

My experience with longhorn, or rook/ceph, or any other StorageClass solution that enables read write many is that eventually, despite all the "self-healing" and "highly available" claims, it all explodes at some point.

Volumes become unmountable, pools stay yellow forever, desynchronization happens... Perhaps (most probably even) I just don't know how to configure them properly, but maybe that is telling of the difficulty to operate/maintain these solutions.

What I took away from this is to try my most to never deploy anything that requires shared persistent volumes. If I need something stateful, it needs to speak to a database or S3 backend, or something that handles redundancy at some other level than filesystem. If I really really need to have a local volume, I'll use local-path-provisoner like you said, which means pinning the pod to a single node, but really that is a concession I am willing to make to not deal with ceph/rook.

Great writeup, it's like I'm reading about my own journey managing kubernetes clusters!

Best of luck

NickBusey · on June 4, 2024

I haven't gotten to performance testing just yet, that should happen pretty soon. Thanks for the local-path-provisioner recommendation, I'll definitely give it a go!

s3rg4fts · on June 4, 2024

Good luck! This is pretty useful for benchmarks https://github.com/longhorn/kbench?tab=readme-ov-file

s3rg4fts · on May 17, 2021

I did it on Procreate on an iPad. You can export layers as GIF (or an MP4 video). But I guess you could do the same with Photoshop on a desktop.

s3rg4fts · on May 17, 2021

Sadly, in this specific case we update almost 1/10th of the table daily and we have also based our entire searching functionality on indexes in this table.

Bad news: our product relies a lot on having so many access patterns so it's hard to limit the indexes at this point

Good news: most of the overhead (as you can see from the table with the timings) seems to be caused by the GIN index. We'll most likely move full text search to ElasticSearch and drop this so things will most likely get better.

Once the move is done, we'll try dropping indexes and using ElasticSearch more for searches. I've seen this work tremendously well in the past so I have high hopes!

s3rg4fts · on May 17, 2021

Hey, author here! The next thing I'm aiming to do is VACUUM more aggressively, like you propose. However, given that we update ~1m rows daily, I don't think that VACUUM alone would do the trick. Except if I went the opposite way and increased the pending list size a lot and VACUUM aggressively. But then I'm afraid that SELECTs could suffer because of the big (unordered) pending list.

Thank you for all of this information! I'll dive into it.

PS: your talk looks amazing. DTrace looks like an exceptional tool and the example you're making gives me a ton of ideas on how I can use it in the future. If I knew you could debug like this maybe I wouldn't have needed to ask in DBA stacked change.

s3rg4fts · on April 5, 2021

A journey debugging slow write performance on PostgreSQL. Assumptions, validation, community support and lessons learned.