AWS Networking Questionnaire & Best Practices for EKS

AWS Networking Questionnaire & Best Practices for EKS

AWS VPC adds an extra layer to secure Nodes with non-publicly facing VPN

What is AWS VPC?

VPC means Virtual Private Cloud. You can launch AWS resources in a logically isolated virtual network that you've defined in a VPC. This virtual network closely resembles a traditional network that you'd operate in your own data center, with the benefits of using the scalable infrastructure of AWS.

A VPC has one subnet in each of the Availability Zones in the Region, EC2 instances in each subnet, and an internet gateway to allow communication between the resources in your VPC and the internet.

What is EKS?

Amazon Elastic Kubernetes Service (Amazon EKS) is a managed Kubernetes service that makes it easy for you to run Kubernetes on AWS and on-premises. Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. Amazon EKS makes it easy to provide security for your Kubernetes clusters, with advanced features and integrations to AWS services and technology partner solutions. For example, IAM provides fine-grained access control and Amazon VPC isolates your Kubernetes clusters from other customers.

EKS Networking Best Practices

It is critical to understand Kubernetes networking to operate your cluster and applications efficiently. Pod networking, also called the cluster networking, is the center of Kubernetes networking. Kubernetes supports Container Network Interface (CNI) plugins for cluster networking.

Amazon EKS officially supports Amazon Virtual Private Cloud (VPC) CNI plugin to implement Kubernetes Pod networking. The VPC CNI provides native integration with AWS VPC and works in underlay mode. In underlay mode, Pods and hosts are located at the same network layer and share the network namespace. The IP address of the Pod is consistent from the cluster and VPC perspective.

The VPC CNI is the default networking plugin supported by EKS and hence is the focus of the guide. The VPC CNI is highly configurable to support different use cases. Here it further includes dedicated sections on different VPC CNI use cases, operating modes, sub-components, followed by the recommendations.

Amazon EKS runs upstream Kubernetes and is certified Kubernetes conformant. Although you can use alternate CNI plugins, here it does not provide recommendations for managing alternate CNIs.

Kubernetes Networking Model

Kubernetes sets the following requirements on cluster networking:

  • Pods scheduled on the same node must be able to communicate with other Pods without using NAT (Network Address Translation).

  • All system daemons (background processes, for example, kubelet) running on a particular node can communicate with the Pods running on the same node.

  • Pods that use the host network must be able to contact all other Pods on all other nodes without using NAT.

Container Networking Interface (CNI)

Kubernetes supports CNI specifications and plugins to implement Kubernetes network model. A CNI consists of a specification (current version 1.0.0) and libraries for writing plugins to configure network interfaces in containers, along with a number of supported plugins. CNI concerns itself only with network connectivity of containers and removing allocated resources when the container is deleted.

The CNI plugin is enabled by passing kubelet the --network-plugin=cni command-line option. Kubelet reads a file from --cni-conf-dir (default /etc/cni/net.d) and uses the CNI configuration from that file to set up each Pod’s network. The CNI configuration file must match the CNI specification (minimum v0.4.0) and any required CNI plugins referenced by the configuration must be present in the --cni-bin-dir directory (default /opt/cni/bin). If there are multiple CNI configuration files in the directory, the kubelet uses the configuration file that comes first by name in lexicographic order.

Amazon Virtual Private Cloud (VPC) CNI

The AWS-provided VPC CNI is the default networking add-on for EKS clusters. VPC CNI add-on is installed by default when you provision EKS clusters. VPC CNI runs on Kubernetes worker nodes. The VPC CNI add-on consists of the CNI binary and the IP Address Management (ipamd) plugin. The CNI assigns an IP address from the VPC network to a Pod. The ipamd manages AWS Elastic Networking Interfaces (ENIs) to each Kubernetes node and maintains the warm pool of IPs. The VPC CNI provides configuration options for pre-allocation of ENIs and IP addresses for fast Pod startup times. Refer to Amazon VPC CNI for recommended plugin management best practices.

Amazon EKS recommends you specify subnets in at least two availability zones when you create a cluster. Amazon VPC CNI allocates IP addresses to Pods from the node subnets. It is better to checking the subnets for available IP addresses.

Amazon VPC CNI allocates a warm pool of ENIs and secondary IP addresses from the subnet attached to the node’s primary ENI. This mode of VPC CNI is called the "secondary IP mode." The number of IP addresses and hence the number of Pods (Pod density) is defined by the number of ENIs and the IP address per ENI (limits) as defined by the instance type. The secondary mode is the default and works well for small clusters with smaller instance types. Please consider using prefix mode if you are experiencing pod density challenges. You can also increase the available IP addresses on node for Pods by assigning prefixes to ENIs.

Amazon VPC CNI natively integrates with AWS VPC and allows users to apply existing AWS VPC networking and security best practices for building Kubernetes clusters. This includes the ability to use VPC flow logs, VPC routing policies, and security groups for network traffic isolation. By default, the Amazon VPC CNI applies security group associated with the primary ENI on the node to the Pods. Consider enabling security groups for Pods when you would like to assign different network rules for a Pod.

By default, VPC CNI assigns IP addresses to Pods from the subnet assigned to the primary ENI of a node. It is common to experience a shortage of IPv4 addresses when running large clusters with thousands of workloads. AWS VPC allows you to extend available IPs by assigning a secondary CIDRs to work around exhaustion of IPv4 CIDR blocks. AWS VPC CNI allows you to use a different subnet CIDR range for Pods. This feature of VPC CNI is called custom networking. You might consider using custom networking to use 100.64.0.0/10 and 198.19.0.0/16 CIDRs (CG-NAT) with EKS. This effectively allows you to create an environment where Pods no longer consume any RFC1918 IP addresses from your VPC.

Custom networking is one option to address the IPv4 address exhaustion problem, but it requires operational overhead. We recommend IPv6 clusters over custom networking to resolve this problem. Evaluate your organization’s plans to support IPv6, and consider if investing in IPv6 may have more long-term value.

EKS’s support for IPv6 is focused on solving the IP exhaustion problem caused by a limited IPv4 address space. In response to customer issues with IPv4 exhaustion, EKS has prioritized IPv6-only Pods over dual-stack Pods. That is, Pods may be able to access IPv4 resources, but they are not assigned an IPv4 address from VPC CIDR range. The VPC CNI assigns IPv6 addresses to Pods from the AWS managed VPC IPv6 CIDR block.

Real-time Interview Q&A for AWS Networking

What is the networking framework for your AWS project?

In AWS, the networking framework typically involves a combination of different services and components to build a scalable and secure network infrastructure. Here are some essential elements of an AWS networking framework:

1. Virtual Private Cloud (VPC): VPC is the fundamental networking component in AWS. It provides a logically isolated virtual network environment where you can launch AWS resources. With VPC, you can define IP address ranges, subnets, route tables, and network gateways.

2. Subnets: Subnets are subdivisions of a VPC’s IP address range. They allow you to segment your network and control traffic flow. Public subnets are connected to the internet, while private subnets are not accessible directly from the internet.

3. Internet Gateway (IGW): An IGW enables communication between a VPC and the internet. It acts as a gateway for internet-bound traffic and allows resources in public subnets to have public IP addresses and receive inbound traffic.

4. Virtual Private Gateway (VGW): A VGW provides a secure connection between your VPC and your on-premises network or another VPC through AWS Direct Connect or VPN connections.

5. Route Tables: Route tables control the traffic flow within your VPC. You can define routes to direct traffic between subnets, the internet, VGWs, or other AWS services.

6. Security Groups and Network Access Control Lists (NACLs): Security groups and NACLs are used for network security. Security groups act as virtual firewalls at the instance level, controlling inbound and outbound traffic. NACLs are stateless, subnet-level firewalls that control traffic at the subnet level.

7. Elastic Load Balancer (ELB): ELB distributes incoming traffic across multiple instances or containers to improve availability and scalability. There are three types of ELB: Application Load Balancer (ALB), Network Load Balancer (NLB), and Classic Load Balancer (CLB).

8. AWS Direct Connect: AWS Direct Connect establishes a dedicated network connection between your on-premises data center and AWS. It provides a more reliable and consistent network performance compared to public internet connections.

9. Virtual Private Network (VPN): AWS VPN enables secure connectivity between your on-premises network and AWS. It establishes an encrypted tunnel over the internet, allowing secure access to resources in your VPC.

10. Content Delivery Network (CDN): AWS offers Amazon CloudFront, a global CDN service, to deliver content and improve the performance of web applications. CloudFront caches and distributes content from edge locations around the world.

These are some of the key components and services within the networking framework for AWS projects. The specific design and configuration of your networking infrastructure will depend on your project’s requirements, scalability needs, security considerations, and connectivity requirements.

As an AWS DevOps engineer, how do you handle the networking errors in the AWS project?

As an AWS DevOps engineer, handling networking errors in an AWS project involves a systematic approach to troubleshooting and resolving issues. Here’s a general outline of how you can handle networking errors:

1. Identify the Error Symptoms: Gather information about the networking error symptoms reported by users or monitoring systems. This could include issues like connectivity problems, slow network performance, or unexpected errors related to networking components.

2. Check Network Configuration: Review the network configuration, including VPC settings, subnets, route tables, security groups, and network ACLs. Ensure that the configuration aligns with your intended design and connectivity requirements.

3. Review Logs and Metrics: Analyze relevant logs and metrics from services such as CloudWatch, VPC Flow Logs, ELB access logs, or application-specific logs. Look for any error messages, anomalies, or patterns that might provide insights into the root cause of the networking error.

4. Verify Internet Connectivity: Check if the affected resources have internet connectivity. Ensure that the associated subnets have appropriate route configurations, including internet gateways (IGWs) and public IP addresses if necessary.

5. Check Security Group and Network ACL Rules: Review the security group and network ACL rules to verify that the necessary ports and protocols are allowed for inbound and outbound traffic. Make any required adjustments to ensure correct traffic flow.

6. Evaluate DNS Configuration: If the networking error is related to DNS resolution, verify the DNS configuration for your resources. Check if the DNS servers are correctly configured, and DNS resolution is functioning as expected.

7. Verify Network Peering or VPN Connections: If you have network peering or VPN connections established between VPCs or on-premises networks, validate the connectivity status, routing, and security configurations.

8. Test Connectivity and Latency: Conduct network connectivity tests between resources to identify any communication issues. Measure latency and packet loss to pinpoint potential network performance problems.

9. Utilize Network Monitoring and Troubleshooting Tools: Leverage AWS networking services like VPC Flow Logs, CloudWatch metrics, AWS X-Ray, or third-party monitoring tools to gain deeper visibility into network traffic, diagnose issues, and track down the root cause of errors.

10. Engage AWS Support: If you’re unable to resolve the networking error through your troubleshooting efforts, consider engaging AWS Support for assistance. Provide them with relevant details, error symptoms, and any findings from your investigations to expedite the troubleshooting process.

Remember that networking errors can have various causes, such as misconfigurations, security group rules, routing issues, or external factors. Therefore, a methodical and systematic approach to troubleshooting, backed by accurate monitoring and log analysis, is crucial to identifying and resolving networking errors in an AWS project.

As an AWS DevOps engineer, what kinds of networking errors you are facing in your day-to-day project, and what is the solution you are doing?

Networking errors that AWS DevOps engineers may encounter and their potential solutions:

1. Connectivity Issues: This includes instances or services being unable to connect to each other or to external resources. Solutions may involve checking security group rules, network ACLs, route tables, and internet gateway configurations to ensure proper connectivity.

2. DNS Resolution Problems: DNS-related errors can cause issues with name resolution, preventing resources from being accessed by their domain names. Solutions may involve verifying DNS server configurations, checking DNS resolution settings in the VPC, or troubleshooting issues with DNS resolution services like Route 53.

3. Routing Misconfigurations: Errors in routing configurations can lead to traffic being misdirected or blocked. Solutions may involve reviewing and updating route tables, confirming proper routing between VPCs or on-premises networks, and checking for conflicts or overlapping IP ranges.

4. Load Balancer Configuration Errors: Load balancer misconfigurations can result in uneven distribution of traffic or disruptions in application availability. Solutions may involve checking load balancer settings, listener configurations, health checks, and target group associations.

5. Subnet or VPC Configuration Issues: Misconfigurations in subnets or VPCs can lead to network-related errors. Solutions may involve verifying subnet CIDR ranges, ensuring appropriate subnets are associated with route tables, and confirming that VPC peering or VPN connections are correctly established.

6. Security Group or Network ACL Misconfigurations: Improperly configured security groups or network ACLs can block incoming or outgoing traffic, causing connectivity issues. Solutions may involve reviewing and adjusting security group rules, updating network ACL settings, or performing packet-level analysis to identify and resolve any blocking rules.

7. VPN or Direct Connect Connectivity Problems: If using VPN or AWS Direct Connect, errors may occur related to connectivity between on-premises networks and AWS. Solutions may involve reviewing VPN configurations, verifying authentication and encryption settings, or troubleshooting issues with the underlying network infrastructure.

8. Performance Bottlenecks: Networking performance issues can arise due to suboptimal configurations, high network latency, or insufficient bandwidth. Solutions may involve monitoring network metrics, analyzing network traffic patterns, optimizing routing, or considering network acceleration technologies like AWS Global Accelerator.

These are just some examples of networking errors that AWS DevOps engineers may encounter in their projects. The specific errors faced will depend on the project’s architecture, configurations, and network requirements. Proper monitoring, proactive configuration reviews, and troubleshooting practices are essential for resolving and preventing networking errors in AWS projects.

Compiled by: Azizul maqsud

References:

https://aws.github.io/aws-eks-best-practices/networking/index/

https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html

[aws.amazon.com/eks/features/#:~:text=Amazon..