We're talking about a product that has existed since 2004. They did: \* Their ow...

andrewf · on March 23, 2023

> * Their own data center, before Docker existed

Also, it might be worth calling out: their product launched in 2004, Linode and Xen were launched in 2003, S3 and EC2 launched in 2007. The cloud as we know it today didn't exist when they started.

grogenaut · on March 23, 2023

Pretty sure they knew the linode folks and were on there early iirc my history. This from hanging out with one of the linode owners back then randomly at a bar in stl

ghaff · on March 23, 2023

Whether DHH is "right" in some philosophical sense, this is a small company with a lot of technical experience in a variety of technologies and with presumably a lot of technical chops, so generalizing their experience to "cloud is good" or "cloud is bad" isn't really possible.

tptacek · on March 23, 2023

I mean, I work for a cloud hosting vendor. I'm not saying one side or the other is right, only that people who are dunking on 37signals for this are telling on themselves.

Rapzid · on March 23, 2023

Well, they were def at Rackspace in there somewhere.

mixdup · on March 23, 2023

"their own datacenter" both previously and now almost certainly means renting bare metal or colocation space from a provider. I highly doubt they have physically built their own datacenter from scratch

johnklos · on March 23, 2023

"renting bare metal or colocation space from a provider"

Those are two totally, completely different things. Their own datacenter means their own equipment in a datacenter and could even mean building out their own datacenter. It never, ever means renting bare metal.

mixdup · on March 24, 2023

>It never, ever means renting bare metal.

Weird, in my company where we are doing the opposite migration (from traditional datacenter where we manage the physical servers to Azure) this is exactly what we mean and say and how we describe it

We talk about "our datacenter" when we really mean racks of servers we rented from Insight, and we say "the cloud" when we refer to Azure. We've never actually had our own datacenter meaning a building we own and manage the entire physical plant of

Almost no one means it that way. Even Twitter is probably leasing colocation space in the "their own datacenter" category vs. GCP and AWS. The evidence is in the fact that Elon was able to just arbitrarily shut down an "entire datacenter". Or that 37signals was able to just arbitrarily move into "their own datacenter" on a whim

johnklos · on March 25, 2023

Referring to rented servers as colocated servers is flatly wrong, no matter how often people are incorrect about it. Sure, some providers put colocation under the same category as VMs and leased hardware, but that doesn't make them overlap.

OTOH, referring to a datacenter of servers that you lease as a datacenter is one thing, but if you have zero hardware that you own in it, would it really be your datacenter, or would it be "the datacenter"?

A datacenter could be anything from a set of IKEA shelves in a room with Internet and power to a fully built out fancy space with redundant power, fire suppression, a full Internet exchange, et cetera, so it's a bit gatekeepery to try to suggest that only huge companies would ever have their own datacenter or their own space with their own hardware in a datacenter.

Rapzid · on March 24, 2023

I'm sure that's the truth of the matter.

datadeft · on March 23, 2023

The fun part is that they do not understand what it means to have your "own datacentert" vs renting server in a co-lo. It does not matter if you are running on AWS on Hetzner it is somebody else's computer.

LastTrain · on March 23, 2023

We were a similar sized company at about the same time - we owned our data centers in the same way we owned our offices - we leased and occupied them. Sure, if the plumbing sprouted a leak the landlord would come to in and fix it, but no one would be confused enough to say we didn’t have our own office space.

imwillofficial · on March 23, 2023

"The fun part is that they do not understand" YES, 37Signals, I company with a legendary pedigree of pushing technical boundaries and open minded with deployment models totally doesn't know the simple thing that you do.

Get a grip.

fxtentacle · on March 23, 2023

You can rent entire rooms from Hetzner and then only you (and I believe the government firefighters) have access cards.

In any case, there are options where you 100% own and control all hardware.

johnklos · on March 23, 2023

What the heck are you talking about? Do you even know how colocation works?

For starters, even small companies can have their own physical datacenters, although that's not necessarily what we're talking about.

Second, renting hardware has absolutely nothing to do with colocation.

bennysonething · on March 24, 2023

What do 37 signals do that makes money?

jb_gericke · on March 23, 2023

Let’s write our own container orchestrator though, because control planes are dumb?

tptacek · on March 23, 2023

I don't understand how the first clause in this sentence connects to the second.

With a simple, predictable workload --- what they have --- it can make sense to lean towards static scheduling, rather than dynamic schedulers. K8s and Nomad are both dynamic schedulers.

This is pretty basic stuff; it's super weird how urgently people seem to want to dunk on them for not using K8s. It comes across as people not understanding that there are other kinds of schedulers; that "scheduling" means what Borg did.

Melingo · on March 23, 2023

Because they already had it running in k8s.

And k8s scales very well to very low and high numbers.

Because k8s provides battle tested Features like rollout lb etc.

And the ecosystem is great.

Certmanager, argocd kube stack.

I'm baffled tbh how they had such a difference experience with k8s than I do

mr_ndrsn · on March 23, 2023

We did! And it did work. And there are def some great things that I (we) love about k8s. Personally, the declarative aspect of it was chef's kiss. "I want 2 of these and 3 of these, please", and it just happens.

Which is the primary reason why we did investigate k8s on-prem. We had already done the work to k8s-ify the apps, let's not throw that away. But running k8s on-prem is different than running your own k8s in the cloud is different than running on managed k8s in the cloud.

Providing all of the bits k8s needs to really work was going to really stretch our team, but we figured with the right support from a vendor, we could make it work. We worked up a spike of harvester + rancher + longhorn and had something that we could use as if it were a cloud. It was pretty slick.

Then we got the pricing on support for all of that, and decided to spend that half million elsewhere.

We own our hardware, we rent cabs and pay for power & network. We've got a pretty simple pxeboot setup to provision hardware with a bare OS that we can use with chef to provide the common bits needed.

It's not 'ultimately flexible in every way', but it's 'flexible enough to meet the needs of our workloads'.

Bagged2347 · on March 24, 2023

What is your position at 37Signals and how do you like it? I'm really impressed by the innovation that comes out of you guys and the workplace culture you folks have.

mr_ndrsn · on March 25, 2023

I'm a Lead SRE on the Ops team. We've got a fantastic bunch of folks, they're amazing to work with!

prmoustache · on March 23, 2023

The main issue is the ecosystem imho.

Bare vanilla k8s or k3s is nice but it doesn't do much outside of your homelab. Once you want k8s on production in the cloud you have to start about thinking of: - loadbalancing and ingress controller - storage - network - iam and roles - security groups - centralized logging - registry management - vulnerability scanning - ci/cd - gitops

And all this is no less complex with k8s than with nomad, bare docker or whatever they chose. And definitely no less complex because it is on a major cloud provider.

Melingo · on March 23, 2023

In all managed services all of that comes out of the box.

Ingress, lb, storage, network...

And I have my small setup running with all of it too. Took me a weekend to set it up.

Rke2, nginx ingress, classic lb in front, cert manager and everything else in argocd.

0x500x79 · on March 23, 2023

Hey Melingo, I noticed that you responded to a lot of different threads in this post. It seems like you are a bit dismissive of people's experiences using K8s. I have also run K8s at scale, and it is not easy, it is not out of the box in cloud providers. There are a ton of addons, knobs, and work that has to be doen to build a sustainable and "production ready" version of K8s (for my requirements) in AWS.

K8s is NOT easy, and I do not believe that in it's current form it is the pinnacle of deployment/orchestration technologies. I am waiting for what is next, because the pain that I have personally experienced around K8s that I know others are feeling as well does not make it a perfect solution for everything, and definitely not usable for others.

At the end of the day it's a tool, and it is sometimes difficult to work with.

prmoustache · on March 23, 2023

Also when you do a mistake on a key part it can fail in a very spectacular way and it can be tricky to debug the issue immediately.

It is usually a game of finding the correct spaghetti(log) in a full plate.

Melingo · on March 23, 2023

I'm really only sharing my experience or view through my experience.

And I think it's the best thing for infra since pre cut bread.

In what issues did you run?

0x500x79 · on March 24, 2023

I know you are sharing your experience, others are as well. Let's not dismiss other's experience just because it doesn't match our own, the truth is most likely somewhere in the middle. Especially when so many people are clamoring saying that they had pain using K8s.

The initial deployment for EKS requires multiple plugins to get to something that is "functional" for most production workloads. K8s fails in spectacular ways (even using Argo, worse using Argo TBH) that require manual intervention. Local disk support for certain types of workloads is severely depressing. Helm is terrible (templating Yaml... 'nuff said). Security groups, IAM roles, and other cloud provider functions require deep knowledge of K8s and the cloud provider. Autoscaling using Karpenter is difficult to debug. Karpenter doesn't gracefully handle spot instance cost.

I could go on, but these are the things you will experience in the first couple days of attempting to use k8s. Overall, if you have deep knowledge of K8s, go for it, but It is not the end-all solution to Infra/container orchestration in my mind.

I fought with a workload for over a day with our K8s experts, it took me an hour to deploy it to an EC2 ASG for a temporary release while moving it back to K8s later. K8s IS difficult, and saying it's not has a lot of people questioning the space.

The way I see it is it starts off easy, and quickly ramps up to extremely complex. This should not be the case.

I worked at a company that had their own deployment infra stack and it was 1000x better than K8s. This is going to be the next step in the K8s space I believe and it may use K8s underneath the covers, but the level of abstraction for K8s is all wrong IMO and it is trying to do too much.

aflag · on March 23, 2023

Deploying a fixed number of servers to a fixed number of hosts has been battle tested for the past 40+ years. It does work.

Melingo · on March 23, 2023

It definitely does not.

The main issues we faced with over 700VMs were: outdated os, full disks, full inodes, broken hardware, missing backups or missing backup strategy, oom.

K8s health itself, fixes out of memory by restarting a pod, solves storage by shipping logs out and killing a pod in case it still runs full, has a rollout startegy, health checks and readiness probes.

It provides easy deployment mechanism out of the box, adding a domain is easy, certificates get renewed centrally and automatically.

Scaling is just a replica number and you have node Autoupgrade features build in.

K8s provides what people build manually out of the box, certified, open sourced and battle tested.

re-thc · on March 23, 2023

The difference is it's likely possible to have 7 physical servers replace those 700VMs when you have your own hardware without all the overhead.

It is much easier to maintain when you look at those numbers.

Melingo · on March 23, 2023

Not in my case.

Every VM had 4 cores and 20gig me.

Run on quite big blades

re-thc · on March 23, 2023

Your case is fine. AMD's 4th-Gen EPYC Genoa processors can do up to 192 cores and 384 threads in a single machine (2 socket) with TBs of RAM.

In most cloud environments a "core" can be just a thread.

Older machines have had 4-CPU socket based chassis with many cores as well. Definitely doable.

fxtentacle · on March 23, 2023

Ansible is your friend. Btw, we're talking about the team that built Capistrano, so they certainly know how to automate deployments.

Melingo · on March 23, 2023

Nope Ansible is horrible in comparison to k8s.

Alone the Paradigma shift from doing things step by step vs describing what you need and than things happen on it is a game changer.

K8s is probably 100x easier than Ansible.

And Ansible also has it's bigger ecosystem like Ansible tower.

Basically your k8s control plane but in bad

KronisLV · on March 23, 2023

> Alone the Paradigma shift from doing things step by step vs describing what you need and than things happen on it is a game changer.

I've actually used both in conjunction and it was decent: Ansible for managing accounts, directories, installed packages (the stuff you might actually need to run containers and/or an orchestrator), essentially taking care of the "infrastructure" part for on-prem nodes, so that the actual workloads can then be launched as containers.

In that mode of work, there was very little imperative about Ansible, for example:

  - name: Ensure we have a group
    ansible.builtin.group:
      name: somegroup
      gid: 2000
      state: present
  
  - name: Ensure that we have a user that belongs to the group
    ansible.builtin.user:
      name: someuser
      uid: 3000
      shell: /bin/bash
      groups: somegroup
      append: yes
      state: present

This can help you setup some monitoring for the nodes themselves, install updates, mess around with any PKI stuff you need to do and so on, everything that you could achieve either manually or by some Bash scripts running through SSH. Better yet, the people who just want to run the containers won't have to think about any of this, so it ensures separation of concerns as well.

Deploying apps through Ansible directly can work, but most of the container orchestrators might admittedly be better suited for this, if you are okay with containerized workloads. There, they all shine: Docker Swarm, Hashicorp Nomad, Kubernetes (K3s is really great) and so on...

tapoxi · on March 23, 2023

I'm on GKE. The hosts and control plane are managed for me. All I need to do is build/test/security scan images and then promote/deploy the image (via Helm) when it goes out to prod.

Using config management and introducing config drift and management of the underlying operating system is a lot more to think about, and a lot more that can go wrong.

vidarh · on March 23, 2023

Deploying a fixed number.of instances to a fixed number of servers does not imply doing it manually.

Melingo · on March 23, 2023

And I didn't say that.

We had all of these problems with self developed automatisation.

It still was garbage.

K8s just solves those issues out if the box.

vidarh · on March 23, 2023

So you did automatisation in a broken way. Here's one way to avoid the issues you described on bare metal:

- Only get servers with IPMI so you can remote reboot / power cycle them.

- Have said servers netboot so they always run the newest OS image.

- Make sure said OS image has a config that isn't broken so you don't get full inodes and so it cycles logs.

- Have the OS image include journalbeat to ship logs.

- Have your health checks trigger a recovery script that restarts or moves containers using one of a myriad of tools; monitoring isn't exactly a new discipline.

Yes, it means you have to have a build process for OS images. Yes, it means you need to pick a monitoring system. And yes, it means you need to decide a scheduling policy.

I wrote an orchestrator pre-K8S that was fewer LOC than the yaml config for my home test K8S cluster. Writing a custom orchestrator is often not hard, depending on your workload, - writing a generic one is.

K8S provides one opinionated version of what people build manually, and when it's a good fit, it's great. When it isn't, I all to often see people spend more time trying to figure out how to make it work for them than it would've taken them to do it from scratch.

prmoustache · on March 23, 2023

Your own failures do not define a model.

Melingo · on March 23, 2023

And?

My experience still counts for something and the example with those 700 VMS is something I didn't just saw once.

tptacek · on March 23, 2023

Having huge sprawling swarms of VMs is, for some teams, a problem to be solved, not a fact of life to be designed around.

Melingo · on March 23, 2023

Sry I'm not getting your point.

If I understand it right: VMS were not there because people needed VMS they were there because people needed compute.

We moved everything to k8s and we were able to do this because k8s can

tptacek · on March 23, 2023

The point is to deliver a small set of applications, not to come up with the most horizontally scalable possible deployment fabric.

vidarh · on March 23, 2023

I ran 1000+ VMs on a self developed orchestration mechanism for many years and it was trivial. This isn't a hard problem to solve, though many of the solutions will end up looking similar to some of the decisions made for K8S. E.g. pre-K8S we ran with an overlay network like K8S, and service discovery, like K8S, and an ingress based on Nginx like many K8S installs. There's certainly a reason why K8S looks the way it does, but K8S also has to be generic where you can often reasonably make other choices when you know your specific workload.

Melingo · on March 23, 2023

And you don't think k8s made your life much easier?

For me it's now much more about proper platform engineering and giving teams more flexibility again knowing that the k8s platform is significantly more stable than what I have ever seen before.

vidarh · on March 23, 2023

No, I don't for that setup. Trying to deploy K8S across what was a hybrid deployment across four countries and on prem, colo, managed services, and VMs would've been far.more effort than our custom system was (and the hw complexity was dictated by cost - running on AWS would've bankrupted that company)

imwillofficial · on March 23, 2023

[flagged]

Melingo · on March 23, 2023

I'm not bragging.

I'm not a 'bro' and 'cringe' this is not tiktok.

It gives context.

Don't you have anything to add to the discussion?