
This is part of Solutions Review’s Premium Content Series, a collection of reviews written by industry experts in maturing software categories. In this submission, Kasten by Veeam Gaurav Rishi highlights the four key layers of cloud native data management and operating layers.
As containerized applications experience an accelerated pace of adoption, day 2 services have become a problem here and now. These Services of day 2 include data management features such as backup and disaster recovery as well as application mobility. In this new world of containerized cloud-native applications, microservices use multiple data services (MongoDB, Say it againKafka, etc.) and storage technologies to store state and are typically deployed in multiple locations (regions, clouds, on-premises).
In this environment where legacy infrastructure or hypervisor-based solutions don’t work, what are good constructs for designing and implementing these data management functions for cloud-native applications? How should you reason about the different data management options provided by storage vendors, data service providers, and cloud providers to decide on the right approach for your environment and needs? This article dives under the covers and discusses the pros and cons of various data management approaches across multiple attributes, including consistency, storage requirements, and performance.
Defining a Vocabulary To begin, we’ll deconstruct and simplify a stack to show where data can reside in a stack. cloud-native app.
Thinking about data management, we could operate on one (or more!) of the layers shown in the figure above. Let’s list these layers:
1. Physical storage
This layer includes various storage hardware options that can store state in non-volatile memory with a choice of physical media ranging from NVMe and SSD devices to spinning disks and even tape. They come in a variety of form factors, including standalone rack arrays and servers.
Physical storage could be located:
- On-premises, where you can encounter storage hardware from vendors like Seagate, Western Digital, and Micron.
- In the data centers of a managed cloud provider. Although you may never encounter a physical device, you know it’s there, giving that “cloud” gravity!
2. File and block storage
This software layer provides file- or block-level constructs to enable efficient read and write operations from the underlying physical storage. In both cases (file and block), the underlying storage can be stand-alone (local disks) or a network-shared resource (NAS or SAN).
- Block storage implementations allow you to create raw storage volumes from local or remote disks that have low latency and are accessible via protocols such as iSCSI and FiberChannel. Block storage implementations on cloud providers include Amazon EBS and GCE Persistent Disks.
- File storage provides shared storage for file semantics and operations using protocols such as NFS and SMB. File storage implementations commonly found on-premises include products from NetApp and Dell EMC. File storage implementations on cloud providers include Amazon EFS, Google Cloud Filestore, and Azure Files.
This layer often provides instantaneous capabilities to make a point-in-time copy of your volume for protection purposes. Additionally, in Kubernetes environments, this layer provides Container Storage Interface (CSI) drivers to standardize APIs that higher layers can use to call snapshot functions. Note that not all CSI implementations are created equal in terms of supported capabilities.
3. Data Services
This layer resides above the file/block storage implementation. It provides various database implementations as well as an increasingly popular type of storage, namely object (aka blob) storage. This is the layer that applications typically interface with, and the choice of underlying database implementations is based on workloads and business logic. With microservices-based applications, polyglot persistence is a standard since each microservice selects the most appropriate data service for the job at hand.
Some database types and a subset of sample implementations include:
- SQL databases: MySQL, PostgreSQL, SQL Server
- NoSQL databases:
- Key value stores: Redis, BerkeleyDB
- Time series databases: InfluxDB, Prometheus
- Graph databases: Neo4j, GraphDB
- Wide column stores: Cassandra, Azure Cosmos
- Document stores: MongoDB, CouchDB
- Message Queues: Kafka, RabbitMQ, Amazon SQS
- Object stores1: Amazon S3, Google Cloud Storage, minio
There are also several hosted implementations of these databases, commonly referred to as database-as-a-service (DBaaS) systems. These typically include one of the database categories listed above and can sometimes provide autoscaling as well as the consumption savings of an enterprise as a service (-aaS). Examples of DBaaS systems include Amazon RDS, MongoDB Atlas, and Azure SQL.
From a data protection perspective, each of the database implementations provides a specific set of utilities (pg_dump Where WAL-E for PostgreSQL, mongodump for MongoDB, etc.) to back up and restore data. It is important to note that there are many utilities available with widely varying capabilities in terms of consistency, recovery granularity, and speed. They are usually limited to a particular database implementation or, at best, database type, whether provided as a standalone utility or as part of an -aaS offering.
4. Stateful Application
The application layer is where the business logic resides, and in the cloud-native world, applications are typically based on modern agile development methodologies and implemented as distributed microservices. Almost every application has some state that needs to persist. Although there are several models for storing application state, we must maintain and protect the following information in the context of a stateful Kubernetes application as an atomic unit:
- Application data: In various data service, block, and file storage implementations spread across multiple containers.
- Application Definition and Configuration: Application image and associated environment configuration distributed across various Kubernetes objects, including ConfigMaps, secretsetc
- Other configuration status: including CI/CD pipeline status, version information, and related information Helm deployment metadata.
It is important to note that for a real deployment, the application is made up of hundreds of these underlying components. Additionally, in a cloud-native construct, the unit of atomicity for protection should be the application to the underlying data service or storage infrastructure layer. As mentioned earlier, this is because the state of an application includes the data, definition, and configuration of the application that is spread across multiple physical or virtual nodes and data services.
Final Thoughts
From the perspective of backup/recovery and application portability, a good data management solution should treat the entire application as the unit of atomicity, rendering traditional hypervisor-centric solutions obsolete. . We’ve also illustrated a simple stack diagram to show where application state actually resides from the perspective of various data services, block and file stores, and physical storage in on-premises implementations and in the cloud. This defines a core vocabulary that allows us to dive under the covers of layers of cloud data management operations.