Web Service

Use cases

APIs: Deploy Flask, Sinatra, Express, Gin, and more.
Server-rendered web applications: Deploy and scale modern web applications with ease, including Next.js
Static web applications: Deploy single-page apps, such as dashboards that don't need to be indexed by Google.

Key features

Horizontal autoscaling: Scale your services up or down based on demand, ensuring optimal performance and cost-efficiency.
Architecture autoscaling: Automatically scale from a low-traffic architecture to a high-traffic architecture, saving you money while you grow.
Easy deployment: Simplify the deployment process by connecting a GitHub repository to your service. Redeploy your service automatically any time you push changes to your repository.
No Dockerfile required: Just bring your own code. We can handle a wide variety of languages and frameworks out-of-the-box. Alternatively, bring your own Dockerfile.
Self-healing: Automatically restart containers that exit. Configure health checks to ensure proper functioning and ensure traffic only hits healthy containers.
Zero-downtime rollouts: Updates your service without downtime using rolling updates. When updated instances are verified to be functioning correctly, proceeds with a new subset until all instances are updated.
Automatic rollback: Deployments rollback automatically when health checks fail or containers fail to start.
Observability: Troubleshoot issues quickly using integrated CloudWatch metrics and logs or your favorite third-party logging service (Datadog, Axiom, etc.)
CDN: Use Cache-Control headers to cache content in CloudFront - a globally distributed content delivery network - with zero configuration.

How it works

Low-traffic architecture

At fewer than ~10 req/s hitting your tasks, it is cheaper to utilize API Gateway with CloudMap service discovery for load balancing ECS Fargate tasks.

High-traffic architecture

With higher traffic workloads, it's cheaper to deploy an Application Load Balancer which can be shared between all of your web services.

Flex traffic

After you deploy your web service you'll be able to select the "flex traffic" option which will automatically scale your architecture between "low" and "high" depending on the amount of traffic we detect hitting your service.

Additionally, if you set some of your web services to "flex" and set one to "high", the web services configured with "flex" will automatically redeploy to utilize the load balancer deployed with the high-traffic web service, as it will be cheaper and more efficient to share the load balancer between all of your services.

Create a web service

You can create a web service by using either a GitHub repository or a container image from a registry as a source. When you opt to connect a GitHub repository, your component will automatically redeploy any time you push the Branch you configure as a source. This process of automation is known as GitOps.

Connect a GitHub repository

To connect a GitHub repository, you'll first need to install the FlexStack app to your individual GitHub account, repositories, or a GitHub organization. Once installed, you can select a repository to connect as the source of the web service.

Configuration options

Name: The name of your web service. This needs to be unique across all services in your current environment.

Branch: The git branch that will be used to deploy your web service when you push changes to it. For example, if you want to configure a branch specific to a "staging" environment, you might create a branch named "staging" off of main/master and connect that branch to your service.

Root directory: The root path within your git repository source code. If your repository is a monorepo, you might specify a package directory here e.g. /apps/backend.

Ports: The ports your service listens on. The "default" port will be mapped to the public internet. It is a best practice to make this configurable in your application with the PORT environment variable and FlexStack will write the the default port to the PORT environment variable. By default, we set port 8080 as the default port.

Here's a basic example of a configurable port using Go's HTTP library:

http.go

package main

import (
    "os"
    "fmt"
    "net/http"
)

func main() {
    port := os.Getenv("PORT")
    if port == "" {
      port = "3000"
    }
    http.HandleFunc("/", hello)
    http.ListenAndServe(fmt.Sprintf(":%s", port), nil)
}

func hello(w http.ResponseWriter, req *http.Request) {
    fmt.Fprintf(w, "hello world\n")
}

Deploy from container registry

To deploy an image from a container registry, click on the "Container image" button. At present, you will need to manually redeploy your component any time you want to update the image.

Configuration options

Name: The name of your web service. This needs to be unique across all services in your current environment.

Image: The image and tag, for example nginxdemos/hello:latest

Start Command: The start command that's passed to the container. By default, the CMD or ENTRYPOINT directive in your Dockerfile are used as the start command.

Ports: The ports your service listens on. The "default" port will be mapped to the public internet. See the example above for instructions about making this configurable.

Health checks

Health checks are used to ask a particular server if it capable of doing its job successfully. They are run continuously on an interval the entire time your server is running. When a server becomes unhealthy, it will be drained and replaced. Thus, services are self-healing. For that reason, we strongly recommend that all services enable health checks, particularly in production environments.

Important note

You must have wget installed in your container to use web service health checks.

In web services, the type of health check used depends on your "Traffic" configuration setting.

Low traffic: If your web service is configured for "low" traffic or is configured for "flex" traffic, but a load balancer has yet to be provisioned, we rely solely on container health checks to your service. The command wget -nv -t1 --spider 'http://localhost:[PORT]/[PATH]' is run on an interval. If the request is successful, the health check passes. If not, it fails. This mimics the behavior of load balancer health checks, meaning that if your architecture changes to include a load balancer, you won't have to update your health check settings.

High traffic: If your web service is configured for "high" traffic, both load balancer health checks and the container health checks described in "low traffic" will be configured.

Configuration options

Health checks may be enabled and configured in the Deploy tab of your component.

Path: The path that will checked to determine if your service is healthy. Common selections are /health and /healthz. Many web server frameworks include middleware or have open source contributing middleware that will add this endpoint for you. If you're also using middleware that has a strict allowed hosts directive, you'll want to include localhost or exclude the health check path from the middleware.

Interval: The approximate amount of time in seconds between health checks of an individual target. The range is 5–300 seconds.

Timeout: The amount of time in seconds to wait for a health response code before failing. The range is 2–120 seconds.

Configuring the task size

By default, your service with start with 0.25 vCPU, 0.5 GiB RAM, and 20GiB of ephemeral storage. These are fairly sensible defaults for a web service, especially one with horizontal scaling enabled. If however you find your service is resource constrained and you're quickly maxing out the CPU or Memory utilization of your containers, it is a good idea to scale them up. You can do this in the Deploy tab for your component.

Pricing will vary depending on region, CPU architecture. and whether spot instances are enabled. Up-to-date pricing data can be found here.

CPU architecture

CPU architecture defines the instruction set, and memory models that are relied on by the operating system and hypervisor. Services can be configured to use one of two architectures: ARM and x86. You can configure this in the Deploy tab for your component.

On AWS, selecting the ARM architecture is the cost-effective option.

Because support is fairly ubiquitous and it has higher "raw performance", we default to the x86 architecture.

If your container image is multi-platform, we highly recommend selecting the "Flex" configuration option instead, which will select the best cost/performance option for your workload automatically.

If you're using FlexStack's autogenerated Dockerfiles, "Flex" is always the best option.

Horizontal scaling (autoscaling)

Horizontal scaling (see also auto-scaling or autoscaling) automatically increases or decreases the number of running tasks in your service based on CPU, memory, and request thresholds. For production environments, we highly recommend enabling this or at the very least, setting a CloudWatch alarm and configuring your task count manually.

For most web services, horizontal scaling tends to be more effective than vertical scaling (increasing resources i.e. CPU, memory to a container) from both a cost and service latency perspective.

Configuration options

Horizontal scaling can be configured in the Deploy tab of your component.

Min tasks: The minimum number of tasks that should be running in your service at any given time. This is an important option to set if you typically experience high load. For most people, the default of "1" is a reasonable option, meaning 1 task will be running at all times. The value here must be greater than zero.

Max tasks: The maximum number of tasks that should be running in your service at any given time. This is an important option for managing costs in the event of something like a DDOS attack. You will need to balance the desire to keep your service online at all times with the ability to pay your cloud bill. The default value is "10" meaning that no more than 10 concurrent tasks will ever be running at the same time. This value must be equal to or greater than "min tasks".

CPU threshold: The target value for CPU utilization across all tasks in the service. If CPU utilization is trending above or below this threshold outside of the cooldown period, a scaling event will occur. Specifically, an alarm associated with triggering a scale-out event evaluates the CPU metric for 3 minutes. If the average value is greater than this % for all three samples, a scale-out event occurs. An alarm associated with scaling-in is more conservative. It evaluates average CPU utilization for the cooldown period (which defaults to 5 minutes) before triggering a scale-in event. This prevents the service from scaling between task counts too quickly, which may negatively impact availability. The value for the scale-in alarm is 5% less than the threshold value to avoid too much fluctuation in service capacity. The default threshold is 80%, which is quite conservative and cost-efficient.

Memory threshold: The target value for Memory utilization across all tasks in the service. If Memory utilization is trending above or below this threshold outside of the cooldown period, a scaling event will occur. As with CPU threshold above, measures are taken to ensure there is not too much fluctuation in service capacity. By default, scaling on memory utilization is disabled. Memory is something that tends to lend itself better to vertical scaling rather than horizontal. That is, if you are finding your tasks to be memory constrained it's likely a better idea to increase the memory allocation of your tasks rather than scale the number of tasks running a service.

Requests threshold: The number of requests per second, per task at which to scale. That is, if this value is set to 100 and the average requests per second, per task over the course of three samples is greater than 100, your service will scale out. The default value for this is 100, which is fairly reasonable and you'll likely want to test how many requests per second your service can handle before settling on a value. Setting a value of -1 disables request-based autoscaling. As with CPU threshold and memory threshold, cooldown periods and conservative scale-in thresholds are in-place to avoid too much fluctuation in capacity.

Cooldown period: The amount of time that AWS Autoscaling waits before adding or removing tasks in response to metrics (CPU threshold, memory threshold, etc). Subsequent scale-out and scale-in events during the cooldown period are ignored. The default value is 900 seconds, or 5 minutes.

CDN

The FlexStack web service architecture includes a CloudFront distribution. This comes with a few critical benefits:

It reduces the cost of data transfer out (DTO) for your service. You'll receive 1TB free per month in the free tier, in addition to saving $0.005/GB compared to standard AWS egress.
It comes with free HTTPS / SSL certificates.
It provides a configurable caching layer for your API or website.
When using Origin Shield, AWS will enable smart routing to keep your request flowing through its network as fast as possible, reducing request latency worldwide by a substantial amount. Origin Shield will also reduce costs for some workloads.
Allows you to protect your service with AWS's WAF (web application firewall) which helps mitigate DDOS attacks.

Configuration options

The CDN can be configured in the Deploy tab of your component.

Always use HTTPS: You can configure your service to always be delivered over HTTPS. That is, requests over HTTP will be forwarded to HTTPS. There are two reasons you may want to disable this: when using a CloudFlare DNS proxy in front of your service or you want to make requests over HTTP fail manually and visibly.

Disable caching: Disables all CloudFront caching. You can control what gets cached and for how long granularly using Cache-Control headers from your service. However, if you want to disable all caching entirely without question, you can select this option.

Enable origin shield: CloudFront Origin Shield is an additional layer in the CloudFront caching infrastructure that helps to minimize your origin’s load, improve its availability and network performance, and reduce its operating costs. FlexStack automatically selects the best Origin Shield region for your workload. This option incurs an additional fee per 10,000 requests that hit the shield, the rate of which depends on the CloudFront region.

Invalidate cache every deploy: When this option is enabled, your entire CDN cache will be invalidated every time the service is redeployed. This can be useful if you're doing something like service static web content that you want to refresh after a new deploy, without requiring manually purging the cache.

Spot pricing

Spot pricing allows you to run services that can be interrupted anytime at up to a 70% discount.

Selecting Flex will enable spot pricing when the environment is set to development or staging and disable it when set to production.

Spot pricing can be configured in the Deploy tab of your component when CPU architecture is not set to ARM.

Web services can typically be safely interrupted so spot pricing is an attractive option for them, but since spot capacity is not guaranteed to be available in every region it's likely you'll want to disable this for web service in production. It is however ultimately your decision how you want to balance the risk.

Build arguments

Build arguments are a way to add configurability to your builds. Pass build arguments at build-time and set a default value that the builder uses as a fallback. Read Docker's build arguments documentation here. Build arguments are not secrets. If you need to pass sensitive information to the build, use secrets and variables instead.

A powerful use case for build arguments is that they can be used to configure FlexStack's autogenerated Dockerfiles.

You can configure build arguments in the Build tab of your component.

Secrets and environment variables

Store configuration and sensitive information like API keys, database passwords, and other secrets. Variables are injected into your service at runtime and during the build process. See the Secrets and Variables documentation for an in-depth look at how they work and how they're used.

These can be configured using the Secrets and variables tab in your component.

You will need to redeploy your service for changes to secrets to take effect.

Domains

By default, two domains are assigned to each web service: a domain ending in .flexstack.app and a CloudFront distribution domain ending in .cloudfront.net. The FlexStack domain will remain the same so long as you do not change the name of your component. The CloudFront domain will never change in the lifetime of your service, however it will not be accessible if you opt to use a custom domain due to host header restrictions.

Follow our Custom Domains guide to configure a custom domain for your web service.