Chapter 02 - Data Model & Query Languages

Talk about what's SQL, NoSQL. How SQL has mismatch and we have to use ORM.

Many to one relationship

In SQL we can do something like

User table

user_idcountry_idname
12343Austin

Country table

country_idname
3Australia

Why dont we just use country_name directly in user table.

  • it's for ease of updating and searching. Consistent between all users
  • This technique is called normalising in db since it remove duplication (i.e every user have a country_name instead of country_id)

In NoSQL database, we're emulate join in the Application layer code

When to use SQL vs DocumentDB

DocumentDB advantage: closer to application code

  • Use if your application data has a tree like (mostly one-to-many) datastructure
  • If it's mostly one-to-many relationship, bad support in join query might not be a problem
    Relational Database: better support for join, many-to-one and many-to-many
  • Use if your applicaiton needs a many to many relationship that need join.
  • For highly interconnected data, Relational or Graph datamodel is preferred.

[!NOTE]
We specificly only discuss about document db in this section, since now NoSQL has column db, the following only applicable to document db

Schema in DocumentDB

The schema in DocumentDB is called schema on read — the applicaiton itself enforces the schema. Whereas in Relational database is the schema on write — the database enforces the schema.

For example, if your application now instead of storing the full name, we need to store the first name and last name separately:

DocumentDB: we fix in the application code to correct it when reading

if (user && user.name && !user.first_name) {
// legacy behavior:
    user.first_name = user.name.split(" ")[0];
}

SQL: you run a migration and fix it once. However it might requires some down time

ALTER TABLE users ADD COLUMN first_name text;
UPDATE users SET first_name = split_part(name, ' ', 1);

Data locality

Data Locality means storing related data close together in the same record/document so that it can fetch everything in one read.

  • This is very good in document database where everything is closed together.
  • In relational database, we need to do multiple joins since the data is spread out, which becomes not good

However, this is only good when we want to read the whole object at once. It's inefficient if you often update or read only small part of a large document:

  • Reading or updating the small part of the document still requires fetching/overwriting to the whole document.
  • Because of this, we often try to keep the size of the document small

Nowadays, there are options to allow locality that's not limited to just document db i.e

  1. Google Spanner: the schema allow the table row should be nested within parent table
  2. Column family (Cassandra, HBase): also allow locality