Private Service

Use cases

Stateless microservices: Deploy Flask, Sinatra, Express, Gin, and more to create microservices available over the internal private network.
Caching and other in-memory data stores: Redis without RDB/AOF persistence enabled.
Message brokers: RabbitMQ or Apache Kafka can be configured to be non-persistent, where they manage and route messages but do not store them permanently.
Workflow engines: Self-hosting services like Temporal or Inngest which use external datastores like Postgres or Cassandra.

Important note on persistence

Private services do not have Docker volumes or other forms of persistent storage that are suitable for running production databases like Postgres or MySQL.

Key features

Load balanced: Round robin internal load balancing is available for free by default using AWS Service Connect.
Service discovery: Internal service discovery is enabled out of the box using AWS Cloud Map and AWS Service Connect.
Horizontal autoscaling: Scale your services up or down based on demand, ensuring optimal performance and cost-efficiency.
Easy deployment: Simplify the deployment process by connecting a GitHub repository to your service. Redeploy your service automatically any time you push changes to your repository.
No Dockerfile required: Just bring your own code. We can handle a wide variety of languages and frameworks out-of-the-box. Alternatively, bring your own Dockerfile.
Self-healing: Automatically restart containers that exit. Configure health checks to ensure proper functioning and ensure traffic only hits healthy containers.
Zero-downtime rollouts: Updates your service without downtime using rolling updates. When updated instances are verified to be functioning correctly, proceeds with a new subset until all instances are updated.
Automatic rollback: Deployments rollback automatically when health checks fail or containers fail to start.
Observability: Troubleshoot issues quickly using integrated CloudWatch metrics and logs or your favorite third-party logging service (Datadog, Axiom, etc.)

How it works

Private services are scalable ECS Fargate services that are discoverable in your private network via internal DNS. You can refer and connect to your services by names using the .flexstack.internal namespace we created for you in AWS Cloud Map. Traffic over this namespace is automatically distributed between tasks without deploying or configuring a load balancer. Each FlexStack service in your cluster has permission to talk to other FlexStack services out-of-the-box.

See the Service Discovery documentation for more details.

Create a private service

You can create a private service by using either a GitHub repository or a container image from a registry as a source. When you opt to connect a GitHub repository, your component will automatically redeploy any time you push the Branch you configure as a source. This process of automation is known as GitOps.

Connect a GitHub repository

To connect a GitHub repository, you'll first need to install the FlexStack app to your individual GitHub account, repositories, or a GitHub organization. Once installed, you can select a repository to connect as the source of the private service.

Configuration options

Name: The name of your private service. This needs to be unique across all services in your current environment.

Branch: The git branch that will be used to deploy your private service when you push changes to it. For example, if you want to configure a branch specific to a "staging" environment, you might create a branch named "staging" off of main/master and connect that branch to your service.

Root directory: The root path within your git repository source code. If your repository is a monorepo, you might specify a package directory here e.g. /apps/backend.

Ports: The ports your service listens on. It is a best practice to make this configurable in your application with the PORT environment variable and FlexStack will write the the default port to the PORT environment variable. By default, we set port 8080 as the default port.

Here's a basic example of a configurable port using Go's HTTP library:

http.go

package main

import (
    "os"
    "fmt"
    "net/http"
)

func main() {
    port := os.Getenv("PORT")
    if port == "" {
      port = "3000"
    }
    http.HandleFunc("/", hello)
    http.ListenAndServe(fmt.Sprintf(":%s", port), nil)
}

func hello(w http.ResponseWriter, req *http.Request) {
    fmt.Fprintf(w, "hello world\n")
}

Deploy from container registry

To deploy an image from a container registry, click on the "Container image" button. At present, you will need to manually redeploy your component any time you want to update the image.

Configuration options

Name: The name of your private service. This needs to be unique across all services in your current environment.

Image: The image and tag, for example redis:alpine

Start Command: The start command that's passed to the container. By default, the CMD or ENTRYPOINT directive in your Dockerfile are used as the start command.

Ports: The ports your service listens on. See the example above for instructions about making this configurable.

Health checks

Health checks are used to ask a particular server if it capable of doing its job successfully. They are run continuously on an interval the entire time your server is running. When a server becomes unhealthy, it will be drained and replaced. Thus, services are self-healing. For that reason, we strongly recommend that all services enable health checks, particularly in production environments.

Configuration options

Health checks may be enabled and configured in the Deploy tab of your component.

Command: The command to run to check the health of your service.

Interval: The approximate amount of time in seconds between health checks of an individual target. The range is 5–300 seconds.

Timeout: The amount of time in seconds to wait for a health response code before failing. The range is 2–120 seconds.

Configuring the task size

By default, your service with start with 0.25 vCPU, 0.5 GiB RAM, and 20GiB of ephemeral storage. If you find your service is resource constrained and you're quickly maxing out the CPU or Memory utilization of your containers, it is a good idea to scale them up. You can do this in the Deploy tab for your component.

Pricing will vary depending on region, CPU architecture. and whether spot instances are enabled. Up-to-date pricing data can be found here.

CPU architecture

CPU architecture defines the instruction set, and memory models that are relied on by the operating system and hypervisor. Services can be configured to use one of two architectures: ARM and x86. You can configure this in the Deploy tab for your component.

On AWS, selecting the ARM architecture is the cost-effective option.

Because support is fairly ubiquitous and it has higher "raw performance", we default to the x86 architecture.

If your container image is multi-platform, we highly recommend selecting the "Flex" configuration option instead, which will select the best cost/performance option for your workload automatically.

If you're using FlexStack's auto-generated Dockerfiles, "Flex" is always the best option.

Horizontal scaling (autoscaling)

Horizontal scaling (see also auto-scaling or autoscaling) automatically increases or decreases the number of running tasks in your service based on CPU, memory, and request thresholds. For production environments, we highly recommend enabling this or at the very least, setting a CloudWatch alarm and configuring your task count manually.

Configuration options

Horizontal scaling can be configured in the Deploy tab of your component.

Min tasks: The minimum number of tasks that should be running in your service at any given time. This is an important option to set if you typically experience high load. For most people, the default of "1" is a reasonable option, meaning 1 task will be running at all times. The value here must be greater than zero.

Max tasks: The maximum number of tasks that should be running in your service at any given time. This is an important option for managing costs in the event of something like a DDOS attack. You will need to balance the desire to keep your service online at all times with the ability to pay your cloud bill. The default value is "10" meaning that no more than 10 concurrent tasks will ever be running at the same time. This value must be equal to or greater than "min tasks".

CPU threshold: The target value for CPU utilization across all tasks in the service. If CPU utilization is trending above or below this threshold outside of the cooldown period, a scaling event will occur. Specifically, an alarm associated with triggering a scale-out event evaluates the CPU metric for 3 minutes. If the average value is greater than this % for all three samples, a scale-out event occurs. An alarm associated with scaling-in is more conservative. It evaluates average CPU utilization for the cooldown period (which defaults to 5 minutes) before triggering a scale-in event. This prevents the service from scaling between task counts too quickly, which may negatively impact availability. The value for the scale-in alarm is 5% less than the threshold value to avoid too much fluctuation in service capacity. The default threshold is 80%, which is quite conservative and cost-efficient.

Memory threshold: The target value for Memory utilization across all tasks in the service. If Memory utilization is trending above or below this threshold outside of the cooldown period, a scaling event will occur. As with CPU threshold above, measures are taken to ensure there is not too much fluctuation in service capacity. By default, scaling on memory utilization is disabled. Memory is something that tends to lend itself better to vertical scaling rather than horizontal. That is, if you are finding your tasks to be memory constrained it's likely a better idea to increase the memory allocation of your tasks rather than scale the number of tasks running a service.

Cooldown period: The amount of time that AWS Autoscaling waits before adding or removing tasks in response to metrics (CPU threshold, memory threshold, etc). Subsequent scale-out and scale-in events during the cooldown period are ignored. The default value is 900 seconds, or 5 minutes.

Spot pricing

Spot pricing allows you to run services that can be interrupted anytime at up to a 70% discount.

Selecting Flex will enable spot pricing when the environment is set to development or staging and disable it when set to production.

Spot pricing can be configured in the Deploy tab of your component when CPU architecture is not set to ARM.

Some private services can be safely interrupted in which case spot pricing is attractive, but since spot capacity is not guaranteed to be available in every region it's likely you'll want to disable this for private service in production.

Build arguments

Build arguments are a way to add configurability to your builds. Pass build arguments at build-time and set a default value that the builder uses as a fallback. Read Docker's build arguments documentation here. Build arguments are not secrets. If you need to pass sensitive information to the build, use secrets and variables instead.

A powerful use case for build arguments is that they can be used to configure FlexStack's autogenerated Dockerfiles.

You can configure build arguments in the Build tab of your component.

Secrets and environment variables

Store configuration and sensitive information like API keys, database passwords, and other secrets. Variables are injected into your service at runtime and during the build process. See the Secrets and Variables documentation for an in-depth look at how they work and how they're used.

These can be configured using the Secrets and variables tab in your component.

You will need to redeploy your service for changes to secrets to take effect.