How do you approach designing the initial architecture for a new multi-team Kubernetes cluster?
Quick Answer
Start with the multi-tenancy model (how much isolation do teams genuinely need — namespaces alone, or stronger separation), establish namespace conventions and RBAC design early (least privilege, per-team scoping), set ResourceQuotas/LimitRanges from the start so no single team can starve the shared cluster, decide on the GitOps/deployment workflow every team will use consistently, and plan observability (logging, metrics, alerting) as a shared platform capability rather than something each team builds independently. Doing this deliberately upfront avoids much harder, more disruptive retrofits once many teams and workloads already depend on an unstructured cluster.
Detailed Answer
This is a system-design-flavored question testing whether a candidate thinks about a cluster as a shared platform serving many teams, with the governance and consistency that implies, rather than just a place to run containers.
Start with the isolation/trust model
Before any technical decisions, clarify: are these teams internal, mutually trusting colleagues who mainly want organizational separation and fair resource sharing, or do any have stricter compliance/security separation requirements? This determines whether namespaces + RBAC + NetworkPolicies is sufficient (see the multi-tenancy question), or whether some teams need dedicated node pools or even separate clusters. Getting this wrong early — assuming light isolation is enough when a team actually needs strict separation — is expensive to retrofit later.
Establish namespace and naming conventions early
A consistent convention (e.g., one namespace per team, or per team-per-environment) established from day one avoids the much messier alternative of retrofitting structure onto a cluster where every team independently invented its own namespace/naming approach. This seems like a small detail, but it's foundational to almost everything else (RBAC scoping, quota assignment, NetworkPolicy design all key off namespace boundaries).
Design RBAC around least privilege from the start
Define a small number of standard role templates (e.g., "team member: full access within your own namespace," "read-only observer," "platform admin: cluster-wide") rather than ad-hoc, one-off permission grants per request — this keeps the RBAC model auditable and consistent as the number of teams and people grows, rather than accumulating an unreviewable pile of bespoke grants over time.
Set ResourceQuotas and LimitRanges before onboarding teams, not after
Establishing per-namespace ResourceQuotas and LimitRanges (see that question) from the very beginning prevents the "noisy neighbor" problem where one team's workload (even unintentionally) starves shared cluster capacity — retrofitting quotas onto a cluster where teams have already grown accustomed to unconstrained resource usage is a much harder, more political conversation than setting sensible defaults upfront.
Decide on the deployment workflow every team will use
Standardizing on a GitOps approach (see that question) — a consistent way every team deploys, with consistent rollback/audit behavior — early on avoids a cluster where different teams have each built their own bespoke, inconsistent deployment tooling, which becomes a genuine platform-support burden once there are many such bespoke approaches to maintain institutional knowledge about.
Plan observability as a shared platform capability
Centralized logging and metrics infrastructure (see that topic), provided once by the platform team and consumed by every application team, is far more efficient than each team independently standing up (and paying for, and maintaining) their own logging/monitoring stack — this should be part of the initial cluster design, not an afterthought each team solves individually later.
Why "deliberately upfront" is the theme tying this together
The overarching judgment being tested is recognizing that a multi-team cluster is fundamentally a shared platform, and platform decisions (isolation model, RBAC conventions, quota policy, deployment workflow, observability) are dramatically cheaper to establish thoughtfully before many teams and workloads depend on the cluster than to retrofit afterward, once undoing inconsistent, ad-hoc practices requires disrupting teams who are already relying on them. A strong answer demonstrates this platform-thinking mindset, not just a list of Kubernetes features.