AWS Database Services – Deeper Dives
- AWS Data Categories & Use Cases
- Database History Timeline
- AWS Purpose-Built Databases
- Why Use - AWS Database Services?
- AWS Database Services - Features
- AWS Database Services - Use Cases
- AWS Database Migration Services
- Oracle on AWS - Deeper Dives
- NoSQL Databases - Deeper Dives
- DB-Engines.com 2019 DB Rankings
Hundreds of thousands of customers have embraced AWS’s built-for the-cloud database services because they perform and scale better, are easier to manage, are more cost effective, and are more reliable than old guard database offerings.
1. Purpose-Built
• AWS’s portfolio of purpose-built databases supports diverse data models and allows you to build use case driven, highly scalable, distributed applications. By picking the best database to solve a specific problem or a group of problems, you can break away from restrictive one-size-fits-all monolithic databases and focus on building applications to meet the needs of your business.
2. Scable
• With AWS Databases, you can start small and scale as your applications grow. You can scale your database’s compute and storage resources with only a few mouse clicks or an API call, often with no downtime. Because purpose-built databases are optimized for the data model you need, your applications can scale and perform better than when built using one-size-fits-all monolithic databases.
3. Fully Managed or Serverless
• With AWS databases, you don’t need to worry about database management tasks such as server provisioning, patching, setup, configuration, backups, or recovery. AWS continuously monitors your clusters to keep your workloads up and running so that you can focus on higher value application development.
4. Enterprise-Class
• AWS databases are built for business-critical, enterprise workloads, offering high availibilty and reliability. You have full oversight of your data multiple levels of security, including network isolation using Amazon VPC, encryption at rest using keys you create and control through AWS Key Management Service (KMS), and encryption-in-transit.
1. Amazon Aurora
• MySQL and PostgreSQL-compatible relational database built for the cloud.
• Performance and availability of commercial-grade databases at 1/10th the cost.
• Amazon Aurora is up to five times faster than standard MySQL databases and three times faster than standard PostgreSQL databases.
• Features a distributed, fault-tolerant, self-healing storage system that auto-scales up to 64TB per database instance.
2. Amazon RDS
• Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud.
• Provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups.
• Frees you to focus on your applications so you can give them the fast performance, high availability, security and compatibility they need.
• Amazon RDS is available on several database instance types
. 1. Optimized for Memory, Performance or I/O
. 2. Provides six familiar database engines, including
. • Amazon Aurora
. • PostgreSQL
. • MySQL
. • MariaDB
. • Oracle Database
. • SQL Server
• Use the AWS Database Migration Service to easily migrate or your existing databases to Amazon RDS.
3. Amazon Redshift
• Fast, simple, cost-effective data warehouse that can extend queries to your Data Lake
• Amazon Redshift is a fast, scalable data warehouse that makes it simple and cost-effective to analyze all your data across your data warehouse and data lake.
• Redshift delivers ten times faster performance than other data warehouses by using machine learning, massively parallel query execution, and columnar storage on high-performance disk.
• Setup and deploy a new data warehouse in minutes, and run queries across petabytes of data in your Redshift data warehouse, and exabytes of data in your data lake built on Amazon S3.
• Start small for just $0.25 per hour and scale to $250 per terabyte per year, less than one-tenth the cost of other solutions.
• Getting Started Guide
4. Amazon DynamoDB
• Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale.
• It’s a fully managed, multiregion, multimaster database with built-in security, backup and restore, and in-memory caching for internet-scale applications.
• DynamoDB can handle more than 10 trillion requests per day and support peaks of more than 20 million requests per second.
• Many of the world’s fastest growing businesses such as Lyft, Airbnb, and Redfin as well as enterprises such as Samsung, Toyota, and Capital One depend on the scale and performance of DynamoDB to support their mission-critical workloads.
• More than 100,000 AWS customers have chosen DynamoDB as their key-value and
document database for mobile, web, gaming, ad tech, IoT, and other applications that need low-latency data access at any scale.
• Create a new table for your application and let DynamoDB handle the rest.
5. Amazon ElastiCache
• Managed, Redis or Memcached-compatible in-memory data store.
• Amazon ElastiCache offers fully managed
. 1. Redis
. • Fast, open source in-memory data store for use as a database, cache,message broker, and queue.
. 2. Memcached
. • Memcached is an easy-to-use, high-performance, in-memory data store.
. • It offers a mature, scalable, open-source solution for delivering sub-millisecond response times making it useful as a cache or session store.
. • Memcached is a popular choice for powering real-time applications in Web, Mobile Apps, Gaming, Ad-Tech, and E-Commerce.
. • Seamlessly deploy, run, and scale popular open source compatible in-memory
data stores.
. • Build data-intensive apps or improve the performance of your existing apps by retrieving data from high throughput and low latency in-memory data stores.
• Amazon ElastiCache is a popular choice for
. • Gaming
. • Ad-Tech
. • Financial Services
. • Healthcare
. • IoT apps
6. Amazon Neptune
• Fast, reliable graph database built for the cloud
• Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets.
• The core of Amazon Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with milliseconds latency.
• Amazon Neptune supports popular graph models Property Graph and W3C’s RDF, and their respective query languages Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets.
• Neptune powers graph use cases such as recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security.
• Amazon Neptune is highly available, with read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across Availability Zones.
• Neptune is secure with support for encryption at rest.
• Neptune is fully-managed, so you no longer need to worry about database management tasks such as hardware provisioning, software patching, setup, configuration, or backups.
7. Amazon Timestream
• Amazon Timestream is a fast, scalable, fully managed time series database service for IoT and operational applications that makes it easy to store and analyze trillions of events per day
at 1/10th the cost of relational databases.
• Driven by the rise of IoT devices, IT systems, and smart industrial machines, time-series data — data that measures how things change over time — is one of the fastest growing data types.
• Time-series data has specific characteristics such as typically arriving in time order form, data is append-only, and queries are always over a time interval.
• While relational databases can store this data, they are inefficient at processing this data as they lack optimizations such as storing and retrieving data by time intervals.
• Timestream is a purpose-built time series database that efficiently stores and processes this data by time intervals.
• With Timestream, you can easily store and analyze log data for DevOps, sensor data for IoT applications, and industrial telemetry data for equipment maintenance.
• As your data grows over time, Timestream’s adaptive query processing engine understands its location and format, making your data simpler and faster to analyze.
• Timestream also automates rollups, retention, tiering, and compression of data, so you can manage your data at the lowest possible cost.
• Timestream is serverless, so there are no servers to manage.
• It manages time-consuming tasks such as server provisioning, software patching, setup, configuration, or data retention and tiering, freeing you to focus on building your applications.
8. Amazon Quantum Ledger Database (QLDB)
• Fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log.
• Owned by a central trusted authority.
• Amazon QLDB is a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log owned by a central trusted authority.
• Amazon QLDB tracks each and every application data change and maintains a complete and verifiable history of changes over time.
• Ledgers are typically used to record a history of economic and financial activity in an organization. Many organizations build applications with ledger-like functionality because they want to maintain an accurate history of their applications’ data, for example, tracking the history of credits and debits in banking transactions, verifying the data lineage of an insurance claim, or tracing movement of an item in a supply chain network.
• Ledger applications are often implemented using custom audit tables or audit trails created in relational databases.
• However, building audit functionality with relational databases is time-consuming and prone to human error.
• It requires custom development, and since relational databases are not inherently immutable, any unintended changes to the data are hard to track and verify.
• Alternatively, blockchain frameworks, such as Hyperledger Fabric and Ethereum, can also be used as a ledger.
• However, this adds complexity as you need to set-up an entire blockchain network with multiple nodes, manage its infrastructure, and require the nodes to validate each transaction before it can be added to the ledger.
• Amazon QLDB is a new class of database that eliminates the need to engage in the complex development effort of building your own ledger-like applications.
• With QLDB, your data’s change history is immutable – it cannot be altered or deleted – and using cryptography, you can easily verify that there have been no unintended modifications to your application’s data.
• QLDB uses an immutable transactional log, known as a journal, that tracks each application data change and maintains a complete and verifiable history of changes over time. QLDB is easy to use because it provides developers with a familiar SQL-like API, a flexible document data model,
and full support for transactions.
• QLDB is also serverless, so it automatically scales to support the demands of your application. There are no servers to manage and no read or write limits to configure. With QLDB, you only pay for what you use.
9. AWS Database Migration Service (DMS)
• Migrate your databases to AWS with minimal downtime
• More than 100,000 databases migrated using AWS Database Migration Service
• AWS Database Migration Service helps you migrate databases to AWS quickly and securely.
• The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.
• The AWS Database Migration Service can migrate your data to and from most widely used commercial and open-source databases.
• AWS Database Migration Service supports homogenous migrations such as Oracle to Oracle, as well as heterogeneous migrations between different database platforms, such as Oracle or Microsoft SQL Server to Amazon Aurora.
• With AWS Database Migration Service, you can continuously replicate your data with high availability and consolidate databases into a petabyte-scale data warehouse by streaming data to Amazon Redshift and Amazon S3.
Common Use Cases
1. Real-Time
• Real-time application use cases such as gaming leaderboards, ride-hailing, social media messaging, and online shopping need microsecond latency and high throughput. You can improve the performance of your real-time application use cases by retrieving information from fast, managed, in-memory data stores and caches, instead of relying entirely on slower disk-based databases. Amazon ElastiCache is a Redis or Memcached-compatible in-memory data store and caching service in the cloud that makes it easy to deploy, run, and scale an in-memory data store and cache in the cloud. Amazon ElastiCache combines the speed, simplicity, and versatility of open-source Redis and Memcached with manageability, security, and scalability from Amazon to power your most demanding real-time applications.
In-Memory Caching Example
• Trapjoy
. • Real-Time Applications – Caching
• “Tapjoy’s mobile app network spans over 9,000 applications and 250 million global consumers on smartphones and tablet devices. We cache real-time statistics and metadata associated with mobile applications for faster access. Amazon ElastiCache has significantly reduced our exposure to Cache Node failures by continuously monitoring the health of our cache cluster and automatically replacing failed nodes. We are very thrilled about the management capabilities of Amazon ElastiCache and are using it in production to power some of our mission-critical and very high throughput applications.”
. ~ Ryan Johns, Vice President of Technology – Tapjoy
2. Internet Scale Use Cases
• Build globally distributed and internet-scale applications that handle millions of requests per second over hundreds of terabytes of data. The services automatically scale up and down to accommodate your high traffic and spiky workloads, and you only pay for the resources you use to optimize cost savings. No need to maintain servers, upgrades, or patches, and your applications have automated high availability.
Gaming Application Example
• Zynga
. • Internet Scale Applications – Gaming
• “With Zynga Poker, we moved a MySQL farm, which required dedicated in-house resources to manage, over to Amazon DynamoDB, which is a fully managed service. It’s resulted in dramatically reduced operational overhead. ..and separately, we’ve gotten a massive performance boost on a Zynga Poker database cluster, with queries that used to take 30 seconds now taking one second. That’s just by taking advantage of the architecture’s modern instance classes–and more importantly, leveraging the continual innovation and investments that AWS makes in systems and the constant discounts it provides.”
. ~ Dorion Carroll, Chief Information Officer – Zynga
3. Migrate to Fully Managed Open Source Databases
• Mobile and web applications generate millions of read and write requests per day, creating high performance demands on popular open source databases like MySQL, PostgreSQL, and Redis. By moving your open source databases to fully managed services like Amazon RDS and Amazon ElastiCache, you can eliminate the need to build and manage your own clusters, ensuring highly availability and performance while reducing operational overhead.
• TalentBin by Monster
. • Transactional Database with Caching Example:
• “TalentBin by Monster made the move to Aurora so as to reduce operational over-head and management of MySQL, which in turn allowed our development team to focus on innovation. Aurora offered significantly faster replication, providing larger write operations that wouldn’t impact any downstream applications. Plus, Aurora’s tools eliminated the need to allocate excessive storage to account for usage and growth demands, which adds even more value and savings. Aurora made it possible for our team to consolidate various databases, reducing our database instance count by roughly 40%. Other gains were earned through automatic snapshots and point-in-time restoration, providing true operational improvements. All of these features made migrating to Aurora an easy decision for us.”
. ~ Travis Theune, Sr. Site Reliability Engineer – TalentBin
Case Studies
Airbnb
• Airbnb is using DynamoDB to store user search history due to the data volume and need for quick lookups to enable personalized search, ElastiCache to store session state in-memory for faster (sub-millisecond) site rendering, and RDS as their primary transactional database.
Capital One
• Capital One uses RDS to store transaction data for state management, Redshift to store web logs for analytics that need aggregations, and DynamoDB to store user data to provide quick access to customers via their mobile app.
Johnson and Johnson
• Johnson and Johnson is using RDS, DynamoDB, and Redshift to minimize time and efforts spend on gathering and provisioning data and quickly deriving insights. AWS database services are helping Johnson and Johnson improve physician compliance, optimize supply chain, and discover new drugs.
Expedia
• Expedia built a real-time data warehouse for lodging market pricing and availability data for internal market analysis using Aurora, Redshift, and ElastiCache. The system processes high-volume lodging pricing and availability data, performing a multi-stream union and self-join with a 24-hour lookback window.
• Save time and cost by migrating to fully managed databases
• Managing databases to run at scale, with high availability and reliability is difficult, time consuming and expensive.
• AWS provides a portfolio of fully managed, high performance, and cost effective databases that can help. With AWS Database Migration Service, you can migrate your databases and data warehouses to AWS with no downtime. Get help from migrations experts from AWS Professional Services and the AWS Partner Network, and leverage the Migration Acceleration Program to accelerate your database migrations to AWS.
• More organizations run their databases and data warehouses on AWS than anywhere else. Customers like NASDAQ, Verizon, TIBCO, Expedia, US Dept. of Veterans Affairs, Snapchat, and Tinder run their business critical database workloads on AWS.
1. Homogeneous Database Migrations
• In homogeneous database migrations, the source and target database engines are the same or are compatible like Oracle to Amazon RDS for Oracle, MySQL to Amazon Aurora, MySQL to Amazon RDS for MySQL, or Microsoft SQL Server to Amazon RDS for SQL Server. Since the schema structure, data types, and database code are compatible between the source and target databases, this kind of migration is a one step process. You create a migration task with connections to the source and target databases, then start the migration with the click of a button. AWS Database Migration Service takes care of the rest. The source database can be located in your own premises outside of AWS, running on an Amazon EC2 instance, or it can be an Amazon RDS database. The target can be a database in Amazon EC2 or Amazon RDS.
2. Heterogeneous Database Migrations
• In heterogeneous database migrations the source and target databases engines are different, like in the case of Oracle to Amazon Aurora, Oracle to PostgreSQL, or Microsoft SQL Server to MySQL migrations. In this case, the schema structure, data types, and database code of source and target databases can be quite different, requiring a schema and code transformation before the data migration starts. That makes heterogeneous migrations a two step process. First use the AWS Schema Conversion Tool to convert the source schema and code to match that of the target database, and then use the AWS Database Migration Service to migrate data from the source database to the target database. All the required data type conversions will automatically be done by the AWS Database Migration Service during the migration. The source database can be located in your own premises outside of AWS, running on an Amazon EC2 instance, or it can be an Amazon RDS database. The target can be a database in Amazon EC2 or Amazon RDS.
3. Development and Test
• AWS Database Migration Service can be used to migrate data both into and out of the cloud for development purposes. There are two common scenarios. The first is to deploy development, test or staging systems on AWS, to take advantage of the cloud’s scalability and rapid provisioning. This way, developers and testers can use copies of real production data, and can copy updates back to the on-premises production system. The second scenario is when development systems are on-premises (often on personal laptops), and you migrate a current copy of an AWS Cloud production database to these on-premises systems either once or continuously. This avoids disruption to existing DevOps processes while ensuring the up-to-date representation of your production system.
4. Database Consolidation
• You can use AWS Database Migration Service to consolidate multiple source databases into a single target database. This can be done for homogeneous and heterogeneous migrations, and you can use this feature with all supported database engines. The source databases can be located in your own premises outside of AWS, running on an Amazon EC2 instance, or it can be an Amazon RDS database. The sources databases can also be spread across different locations. For example, one of the source databases can be in your own premises outside of AWS, while the second one in Amazon EC2, and the third one is an Amazon RDS database. The target can be a database in Amazon EC2 or Amazon RDS.
5. Continuous Data Replication
• You can use AWS Database Migration Service to perform continuous data replication. Continuous data replication has a multitude of use cases including Disaster Recovery instance synchronization, geographic database distribution and Dev/Test environment synchronization. You can use DMS for both homogeneous and heterogeneous data replications for all supported database engines. The source or destination databases can be located in your own premises outside of AWS, running on an Amazon EC2 instance, or it can be an Amazon RDS database. You can replicate data from a single database to one or more target databases or data from multiple source databases can be consolidated and replicated to one or more target databases.
Common Migration Use Cases
1. Oracle and MS SQL Server to Amazon Aurora
• Amazon Aurora is a fully managed, MySQL and PostgreSQL compatible relational database built for the cloud, that combines the performance and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Aurora is up to five times faster than standard MySQL databases and three times faster than standard PostgreSQL databases. It provides the security, availability, and reliability of commercial-grade databases at 1/10th the cost.
2. Cassandra and MongoDB to Amazon DynamoDB
• Amazon DynamoDB is a fully managed, multi-region, multi-master database that provides consistent single-digit millisecond latency, and offers built-in security, backup and restore, and in-memory caching. DynamoDB automatically scales throughput up or down, and continuously backs up your data for protection. DynamoDB gives your globally distributed applications fast access to local data by replicating tables across multiple AWS Regions.
3. Teradata to Amazon Redshift
• Amazon Redshift is a fast, scalable data warehouse that makes it simple and cost-effective to analyze all your data across your data warehouse and data lake. Redshift delivers ten times faster performance than other data warehouses. You can setup and deploy a new data warehouse in minutes, and run queries across petabytes of data in your Redshift data warehouse, and exabytes of data in your data lake built on Amazon S3.
4. Oracle and MS SQL Server to Amazon RDS
• Amazon Relational Database Service (Amazon RDS) is a managed relational database service with a choice of six popular database engines. RDS makes it easy to set up, operate, and scale a relational database in the cloud with just a few clicks. It provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups.
Best Practices for Running Oracle Database on Amazon Web Services – 2018
Abstract – Amazon Web Services (AWS) offers you the ability to run your Oracle Database in a cloud environment. Running Oracle Database in the AWS Cloud is very similar to running Oracle Database in your data center. To a database administrator or developer, there are no differences between the two environments. However, there are a number of AWS platform considerations relating to security, storage, compute configurations, management, and monitoring that will help you get the best out of your Oracle Database implementation on AWS. This whitepaper provides best practices for achieving optimal performance, availability, and reliability, and lowering the total cost of ownership (TCO) while running Oracle Database on AWS. The target audience for this whitepaper includes database administrators, enterprise architects, systems administrators, and developers who would like to run their Oracle Database on AWS.
Amazon Web Services (AWS) provides a comprehensive set of services and tools for deploying Oracle Database on the reliable and secure AWS Cloud infrastructure. AWS offers its customers two options for running Oracle Database on AWS:
. 1. Amazon RDS for Oracle
. • Amazon RDS also comes with a License Included service model, which allows you to pay per use by the hour.
. 2. Advanced Architectures for Oracle DB on Amazon EC2
• Amazon RDS for Oracle
• Best Practices for Running Oracle Database on Amazon Web Services – 2018
• Choosing Between Amazon RDS and Amazon EC2 for Your Oracle Database
• Oracle Applications on Amazon Web Services
• Oracle Database on EC2 – AWS Quickstart
. • Includes Oracle Data Guard and Oracle Automatic Storage Management (ASM)
The Rise of NoSQL Databases
• The Rise of NoSQL Databases
. • NoSQL Databases are gaining more popularity because of their ability to handle unstructured data and cater to this huge increase in data volume.
• Why Use NoSQL Databases?
• 5 Reason NoSQL Adoption is Booming
• Massively Interactive Enterprises are Driving NoSQL Adoption
The 4 Basic Types of NoSQL Databases
• How to choose the right NoSQL database
. • NoSQL databases vary in architecture and function, so you need to pick the type that is best for the desired task
• 4 Types of NoSQL Databases
• Exploring the Different Types of NoSQL Databases:
1. Key-Value Store – It has a Big Hash Table of keys & values
. • {Example- Riak, Amazon S3 (Dynamo)}
2. Document-based Store – It stores documents made up of tagged elements.
. • {Example- CouchDB}
3. Column-based Store – Each storage block contains data from only one column,
. • {Example- HBase, Cassandra}
4. Graph-based – A network database that uses edges and nodes to represent and store data.
. • {Example- Neo4J}
• How Graph Databases are Related to Other NoSQL Databases and How They Differ
The top 5 Commercial Systems – March 2019
1. Oracle
2. Microsoft SQL Server
3. IBM DB2
4. Microsoft Access
5. Splunk
The top 5 Open Source Systems – March 2019
1. MySQL
2. PostgreSQL
3. MongoDB
4. Redis
5. Elasticsearch
DB-Engines.com Ranking of Relational DBMS – 2019
1. Oracle
2. MySQL
3. Microsoft SQL Server
4. PostgreSQL
5. IBM DB2
6. Microsoft Access
7. SQLite
8. MariaDB
9. Teradata
10. Hive
11. FileMaker
12. SAP Adaptive Server
13. SAP HANA
14. Microsoft Azure SQL Server
15. Informix
16. Vertica
17. Amazon Redshift
18. Firebird
19. Netezza
20. Google BigQuery
DB-Engines.com Ranking of Key-value Stores – 2019
1. Redis
2. Amazon DynamoDB
3. Memcached Key-value
4. Microsoft Azure Cosmos DB
5. Hazelcast Key-value