Designing Data-Intensive Applications

1. Reliable, Scalable, and Maintainable Applications

2. Data Models and Query Languages

3. Storage and Retrieval

4. Encoding and Evolution
A note on B-tree lightweight lock for concurrency control: My intuition is that we can use a R/W lock on the nodes we visited.

5. Replication

6. Partitioning
A hashing strategy for easier rebalancing. Used both in data partitioning and request load balancing.
Map nodes and keys into a same space using a same hash function. A key would be stored on the node with a successor hashed value. Concatenate the begin and the end of this hashed space so every key has a node successor.
- Layers of paritioning:
- Partition by key. Lowest level. This is often what we talk about when discussing partitioning.
- Partition by feature. Different service storages could be considered to be partitions of one large system.
- Partition query layer. The level closest to the application. At this level we can add a service to abstracts away the detail of partitioning methods and make life easier for application writers.
- Partitioning Criteria
- Hash (+ mod N round robin)
- Key range
- Compound: hash + key range
- Common problems
- Joins are usually not supported because of inefficiency: may be solved by denormalization (keep redundant information).
- Foreign key constraints not supported: implement in the application
- Rebalancing

7. Transactions

8. The Trouble with Distributed Systems
Last modified 2yr ago