
Lecture 13: Advanced Database Systems
Building upon the fundamentals of Database Management Systems (DBMS) covered earlier, this lecture explores Advanced Database Systems — focusing on modern techniques that ensure efficiency, reliability, and scalability. We will study transactions, indexing, distributed databases, NoSQL systems, and Big Data platforms.
1. Introduction
A database system stores, manages, and retrieves data efficiently. In today’s data-driven world, advanced features are needed to support millions of users, handle large-scale data, and ensure security and consistency.
2. Transactions in Database Systems
A transaction is a sequence of database operations that performs a single logical function. Transactions must follow the ACID properties:
- Atomicity: Either all operations are executed or none are.
- Consistency: Database remains in a valid state after the transaction.
- Isolation: Transactions do not interfere with each other.
- Durability: Once a transaction is committed, it is permanent.
Example: A bank transfer (debiting one account and crediting another) must be atomic.
3. Indexing
Indexes improve the speed of data retrieval. Instead of scanning the entire table, the database uses a data structure (like a B-Tree or Hash Table) to quickly locate records.
- Primary Index: Based on primary keys.
- Secondary Index: Based on non-primary attributes.
- Clustered Index: Records stored in sorted order on disk.
- Non-Clustered Index: Separate structure pointing to the data location.
4. Distributed Databases
A distributed database is spread across multiple locations but appears as a single database to users.
- Horizontal Partitioning: Rows divided among multiple servers.
- Vertical Partitioning: Columns divided among servers.
- Replication: Copy of data stored at multiple sites for reliability.
Challenges include synchronization, fault tolerance, and ensuring consistency across sites.
5. NoSQL Databases
NoSQL databases are designed for unstructured or semi-structured data, high scalability, and flexibility. They are widely used in real-time applications, IoT, and Big Data.
- Key-Value Stores: (e.g., Redis, DynamoDB) — store data as key-value pairs.
- Document Stores: (e.g., MongoDB, CouchDB) — store JSON-like documents.
- Column-Oriented: (e.g., Cassandra, HBase) — optimized for analytics and queries on large datasets.
- Graph Databases: (e.g., Neo4j) — represent data as nodes and relationships.
6. Big Data and Databases
Traditional databases struggle with Big Data, which is characterized by the 4 V’s:
- Volume: Massive amounts of data (terabytes, petabytes).
- Velocity: Rapid data generation and processing.
- Variety: Structured, semi-structured, and unstructured data.
- Veracity: Ensuring trust and accuracy of data.
Big Data frameworks like Hadoop (batch processing) and Spark (real-time processing) integrate with modern databases to handle large-scale data analytics.
7. Database Security and Privacy
With sensitive data stored in databases, advanced security mechanisms are required:
- Encryption of data at rest and in transit.
- Access control and role-based permissions.
- Auditing and logging of user activities.
- Data masking and anonymization for privacy.
8. Applications of Advanced Databases
- E-commerce platforms (Amazon, eBay).
- Social media (Facebook, Twitter) using distributed and NoSQL databases.
- Healthcare systems storing sensitive medical data.
- Banking and financial systems with high transaction requirements.
- Big Data analytics for decision-making in business and government.
9. Summary
- Advanced database systems extend basic DBMS concepts for scalability and reliability.
- Transactions ensure ACID properties for safe operations.
- Indexing enhances query performance.
- Distributed and NoSQL databases support modern large-scale applications.
- Big Data systems integrate with databases to manage massive datasets.
Next Lecture (14): Compiler Design — How High-Level Code Becomes Machine Code