Chapter 02 - Data Model & Query Languages
Talk about what's SQL, NoSQL. How SQL has mismatch and we have to use ORM.
Many to one relationship
In SQL we can do something like
User table
| user_id | country_id | name |
|---|---|---|
| 1234 | 3 | Austin |
Country table
| country_id | name |
|---|---|
| 3 | Australia |
Why dont we just use country_name directly in user table.
- it's for ease of updating and searching. Consistent between all users
- This technique is called normalising in db since it remove duplication (i.e every user have a
country_nameinstead ofcountry_id)
In NoSQL database, we're emulate join in the Application layer code
When to use SQL vs DocumentDB
DocumentDB advantage: closer to application code
- Use if your application data has a tree like (mostly one-to-many) datastructure
- If it's mostly one-to-many relationship, bad support in
joinquery might not be a problem
Relational Database: better support for join, many-to-one and many-to-many - Use if your applicaiton needs a many to many relationship that need join.
- For highly interconnected data, Relational or Graph datamodel is preferred.
[!NOTE]
We specificly only discuss about document db in this section, since now NoSQL has column db, the following only applicable to document db
Schema in DocumentDB
The schema in DocumentDB is called schema on read — the applicaiton itself enforces the schema. Whereas in Relational database is the schema on write — the database enforces the schema.
For example, if your application now instead of storing the full name, we need to store the first name and last name separately:
DocumentDB: we fix in the application code to correct it when reading
if (user && user.name && !user.first_name) {
// legacy behavior:
user.first_name = user.name.split(" ")[0];
}
SQL: you run a migration and fix it once. However it might requires some down time
ALTER TABLE users ADD COLUMN first_name text;
UPDATE users SET first_name = split_part(name, ' ', 1);
Data locality
Data Locality means storing related data close together in the same record/document so that it can fetch everything in one read.
- This is very good in document database where everything is closed together.
- In relational database, we need to do multiple joins since the data is spread out, which becomes not good
However, this is only good when we want to read the whole object at once. It's inefficient if you often update or read only small part of a large document:
- Reading or updating the small part of the document still requires fetching/overwriting to the whole document.
- Because of this, we often try to keep the size of the document small
Nowadays, there are options to allow locality that's not limited to just document db i.e
- Google Spanner: the schema allow the table row should be nested within parent table
- Column family (Cassandra, HBase): also allow locality