Skip to article frontmatterSkip to article content

Introduction to SQL

SQL (Structured Query Language) is a powerful language used for interacting with databases. It allows you to perform various operations such as querying data, updating records, and managing database structures.

Here is an example SQL query that retrieves the total expenses for each account from a hypothetical journal_entries table, filtering for entries in the “Expense” account type during the third quarter of 2025, and ordering the results by total expense in descending order:

SELECT
    account_name,
    ROUND(SUM(amount), 2) AS total_expense
FROM journal_entries
WHERE account_type = 'Expense'
  AND entry_date BETWEEN '2025-07-01' AND '2025-09-30'
GROUP BY account_name
ORDER BY total_expense DESC;
StepWhat It Does
WHERE account_type = 'Expense'Filters only expense accounts
BETWEEN '2025-07-01' AND '2025-09-30'Focuses on one fiscal quarter
SUM(amount)Sums all amounts per account
ROUND(..., 2)Rounds for presentation
ORDER BY total_expense DESCShows largest expense categories first

🧩 Different Types of Databases

There are several types of databases, each with its own features and use cases. Some of the most popular databases include:

🧱 Relational Databases (SQL-based) - What we use in this course

Concept: Data stored in structured tables with predefined schemas; relationships enforced through keys (primary, foreign). Use case: Transactional systems, financial data, HR systems, enterprise analytics.

Popular Examples:

🧮 NoSQL (Document) Databases

Concept: Schema-less or semi-structured; designed for flexibility, scalability, and unstructured data (text, JSON, IoT, etc.). Use case: Web apps, analytics pipelines, real-time systems.

Popular Examples:

🔑 Key-Value Stores

Concept: Data stored as simple key-value pairs, like a giant dictionary. Use case: Caching, session management, quick lookups, real-time data.

Popular Examples:

📊 Columnar Databases

Concept: Data stored by columns instead of rows, which speeds up aggregation and analytical queries Use case: Business intelligence (BI), OLAP, dashboards, data warehouses.

Popular Examples:

🕸️ Graph Databases

Concept: Data stored as nodes (entities) and edges (relationships); optimized for relationship-heavy data.
Use case: Social networks, fraud detection, recommendation systems.

Popular Examples:


🛡️ Key Features of Relational Databases


💡 Why Use SQL?

Reasons to use SQL over spreadsheets

 

CriteriaSpreadsheetSQL
🧮 PerformancePerformance slows as data growsDesigned to handle massive amounts of data efficiently
🧱 Data IntegrityProne to errors and inconsistenciesEnforces data integrity through data types, constraints, and relationships
👥 Multi-User AccessOften limited to single-user accessDesigned for multiple users to access and modify data simultaneously
🔗 Complex Data RelationshipStruggles with complex relationshipsExcels at representing complex relationships through data models and normalization
📊 Data Analysis and ReportingOffers basic analysis and reporting toolsPowerful data retrieval and analysis using the built-in query language
🔄 Real-Time IntegrationRequires manual data transferDesigned for seamless integration

📜 The History of SQL

YearMilestoneKey Contribution
1970Codd’s Relational ModelFoundation of relational databases
1974IBM develops SEQUEL (System R)Birth of SQL
1979Oracle v2 releasedFirst commercial SQL RDBMS
1986ANSI standardizes SQLSQL becomes official standard
1995MySQL releasedOpen-source SQL revolution*
2000sPostgreSQL, SQL Server, Oracle growthSQL dominates enterprise data
2010sBigQuery, Redshift, SnowflakeCloud-based SQL analytics
2020sDuckDB, SQLite, Polars SQLLocal and embedded analytics SQL

* MySQL was later acquired by Oracle Corporation in 2010, but it remains an open-source project maintained by the community.

1970 - Birth of Relational Databases

Dr. Edgar F. Codd, a computer scientist at IBM, published a groundbreaking paper titled “A Relational Model of Data for Large Shared Data Banks” in 1970. In this paper, Codd introduced the concept of the relational database model, where data is stored in tables (relations) with rows and columns.

Early 1970s - SEQUEL Language

In the early 1970s, IBM researchers Donald Chamberlin and Raymond Boyce developed a language called SEQUEL (Structured English Query Language) to manipulate and retrieve data.

1974 - System R Prototype

In 1974, IBM began developing System R, a prototype relational database management system (RDBMS) that implemented the SEQUEL language. SEQUEL was later renamed SQL.

1979 - Oracle’s Commercial RDBMS

In 1979, Oracle (then known as Relational Software, Inc.) released the first commercially available RDBMS based on SQL. This marked a significant milestone in the adoption of SQL as a standard for database management.

1986 - SQL Standardization

In 1986, the American National Standards Institute (ANSI) adopted SQL as the standard language for relational database management systems. This standardization helped promote the widespread adoption of SQL across different database systems.


💼 Open-Source vs Proprietary Databases

Open-Source means that the database software’s source code is freely available. Anyone can use, modify, and distribute it under a permissive license (e.g., MIT, Apache, GPL).

Proprietary (Commercial) means that the database is owned and licensed by a company. Users must pay for usage, support, or advanced features. The source code is closed.

FactorOpen-SourceProprietary
CostFreeLicense / subscription required
SupportCommunity + paid managed optionsDedicated vendor support
ExtensibilityHighly customizable, pluginsControlled extensions
Analytics IntegrationEasy with Python, R, and BI toolsDeep integration with Microsoft / Oracle stack
Use in Education✅ Ideal (lightweight, free)❌ Usually expensive to license

🧪 Open-Source Software Characteristics

Advantages:

Disadvantages:

📚 Proprietary Software Characteristics

Advantages:

Disadvantages:


✅ Key Concepts

🗄️ Databases

A database is a structured collection of tables that store related data. A database management system (DBMS) is software that allows you to create, manage, and interact with databases.

🧷 Tables (Relations)

A table is a collection of related data organized in rows and columns. Each table has a name and consists of:

🧾 Schemas

A schema defines the structure of a table, including the names and data types of its columns. It acts as a blueprint for how data is organized within the table.

🔒 Constraints

Constraints are rules applied to table columns to enforce data integrity. Common constraints include:

TypePurpose
PRIMARY KEYEnsures uniqueness of each row.
FOREIGN KEYMaintains valid relationships between tables.
NOT NULLPrevents missing values.
UNIQUEPrevents duplicates in a column.
CHECKEnforces valid ranges or conditions (e.g., amount > 0).

📇 Indexes

An index is a special data structure that improves the speed of data retrieval operations on a table. It works like an index in a book, allowing the database to quickly locate rows based on the values in indexed columns.

🔄 Transactions

A transaction is a sequence of one or more SQL operations that are treated as a single unit of work. The single unit must either be fully completed or fully rolled back to maintain data integrity. Transactions follow the ACID properties:

🖼️ Views

A view is a virtual table that is based on the result set of a SQL query. It allows you to simplify complex queries, encapsulate business logic, and provide a layer of abstraction over the underlying tables. Views do not store data themselves; they dynamically retrieve data from the base tables when queried.