AWS : The different types of Databases
First and foremost, thank you to all who have been following along with this series. I’ve been creating blogs based off of what I’ve learned every week about AWS. I’d love to know what your experience is with AWS whether you’re learning it or if you’re a cloud practitioner. With that being said, lets get to it!
Databases 101
From my experience with databases(DB) the first thing that comes to mind is an excel sheet. Using SQLite3 and PostgreSQL, I always think of it as a table with columns and rows, where each row is an instance of a particular class and each column is an attribute of the instance. Check out figure 1 to see what I’m talking about!
Relational Databases
In Figure 1, we can see that each row is an instance of a Customer where each customer has a first and last name as well as a birthdate. This is what we call a relational database.
Relational Databases for AWS (RDS) come in different forms
- SQL Server
- Oracle
- MySQL Server
- PostgreSQL
- Aurora
- MariaDB
The incredible features that RDS offers is that they are held in multiple Available Zones and have Read Replicas.
Read Replicas are copies of your production database. This sends an EC2 instance to read from replicas rather than from production database so that you mitigate performance hits.
Multiple Availability Zones can be primarily used as a preventative measure for any catastrophic event. If something happens to your primary database, you’ll have others that mitigate what would have been a major setback and in that way, aids in faster disaster recovery.
Relational Databases are best used where data integrity is important!
Non-Relational Database
Also called NoSQL DB’s are best used for unstructured or semi-structured data, content management, in-depth data analysis, and rapid prototyping. They are grouped into four categories.
Key Value Stores == Fields
Column Stores == Rows
Document Stores == Table
Graph Stores
Unlike relational databases, columns inside the document store vary and do not effect the other rows.
Why does any of this matter?
Well, in short sometimes a company will want information that would be considered far too complex for a relational table to handle. Think about purchasing a flight to destination B through jetBlue. Each transaction involves an individual database record made up of multiple fields or columns. Now let’s say jetBlue wants information on their year-to-year financial performance. This would benefit from a query to a non-relational database.
OLAP and OLTP
Online Transaction Processing(OLTP) —
is a category of data processing that involves transaction-oriented tasks. It involves updating, inserting, or deleting small bits of data inside of a database.
OLTP handles a large amount of small transactions, simple queries, and are based on INSERT, DELETE, and UPDATE operations.
In short, relational databases are best for OLTP.
Online Analytics Processing(OLAP) —
is another category of data processing that facilitates business intelligence. This for of processing data deals with the bigger picture. For example, a company may want business reporting on sales, marketing, budgeting, and financial reporting.
OLAP handles large volumes of data with complex queries, and are based on SELECT commands for reporting.
In short, non-relational databases are best for OLAP.
Data Warehousing
Think about the software querying the database. It’s going to have a massive performance hit on the database and that’s why data warehousing was created. It allows for online analytics processing away form your primary database.
This is used for business intelligence to pull large datasets used by management to do queries on data. Amazons Data Warehousing solution is called Redshift.
ElastiCache
Lastly, another one of amazon’s database products is ElastiCache. It is a web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud. This improves performance of web apps by allowing you to retrieve info from in-memory cache instead of relying entirely on disk-based databases.
Think about Amazon’s landing page and how many people occupy that space at any given moment. It wouldn’t make sense for the web servers to query the same database as many times as there are people on the landing page. So instead of querying the database, it queries ElastiCache that caches the most common queries and take away the load from the primary database.
Elastic-Cache comes in two cacheing engines:
1. Memcached
2. Redis
In short, ElastiCache is a caching engine in the cloud for the most common queries and will take a big load off the production database. For what I’m studying, I do not need to go in detail about these two engines but if you’d like to learn more about it, please let me know and I’ll do my best to explain it.
That is all for this weeks blog! So far, I’m really enjoying this series because writing about it really challenges my understanding of the topic. Right now I don’t feel like a master at AWS but I know more today than I did yesterday (-pats back-)and that’s all that matters. As always, happy coding!