Let's write a database (part 2)
2025-08-05
Hello, hello, hello. Welcome to part two of this adventure where we're making a database by hand. In part one we stole a SQL parser, did some table scans, filtered some results, and projected the ones we're interested in. I promised we'd look at joins today, but first we need to do a bit of housekeeping.
If you'd like to look at the code, it's right here.
Storing our rows better
Previously we passed our rows around as Vec<serde_json::Value>
so each row was a JSON value that looked like this:
{
"id": 1,
"name": "Mr Horse",
"age": 100
}
This is poor for two reasons:
-
Having the column names in every row is wasteful
-
If we want to rename columns (or disambiguate the "name" column from two tables that have been joined), we have to change every row
Instead we have a Row
type that contains a Vec
full of individual serde_json::Value
types for each column entry:
pub struct Row {
pub items: Vec<serde_json::Value>,
}
And we pass around a single Schema
type that holds all the column names:
pub struct Schema {
pub columns: Vec<Column>,
}
pub struct Column {
pub name: String,
}
Changes to our query functions
Previously we just returned a big pile of rows from each query function, but now each one returns a QueryStep
type:
pub struct QueryStep {
pub schema: Schema,
pub rows: Vec<Row>,
}
Now we have a bunch of rows, and a Schema
so we know what's in them.
From
When selecting fields, we grab all the rows, and then return a schema too. These are hardcoded for now:
fn schema(table_name: &TableName) -> Vec<Column> {
match table_name.0.as_str() {
"Album" => vec!["AlbumId".into(), "Title".into(), "ArtistId".into()],
"Artist" => vec!["ArtistId".into(), "Name".into()],
"Track" => vec![
"TrackId".into(),
"Name".into(),
"AlbumId".into(),
"MediaTypeId".into(),
"GenreId".into(),
"Composer".into(),
"Milliseconds".into(),
"Bytes".into(),
"UnitPrice".into(),
],
_ => todo!("unknown schema"),
}
}
Filtering
For instance, this means our filtering now looks like this, passing the schema through unchanged from whatever Query
it wraps:
Query::Filter(Filter { from, filter }) => {
let QueryStep {
schema,
rows,
} = run_query(from)?;
let mut filtered_rows = vec![];
for row in rows {
if filter::apply_predicate(&row, &schema, filter)? {
filtered_rows.push(row);
}
}
Ok(QueryStep {
schema,
rows: filtered_rows
})
}
Projections
Our projections let us drop and reorder fields, so they'll change the schemas as well as the rows:
Query::Project(Project { from, fields }) => {
let QueryStep {
schema,
rows,
} = run_query(from)?;
let mut projected_rows = vec![];
for row in &rows {
projected_rows.push(project::project_fields(row, &schema, fields)?);
}
let schema = project::project_schema(&schema, fields)?;
Ok(QueryStep {
schema,
rows: projected_rows,
})
}
As well as a project_fields
function we have a matching project_schema
function that creates a new schema.
pub fn project_schema(schema: &Schema, fields: &[Column])
-> Result<Schema, QueryError>
{
let mut columns = vec![];
for field in fields {
let index = schema.get_index_for_column(field).ok_or_else(|| {
QueryError::ColumnNotFoundInSchema {
column_name: field.clone(),
}
})?;
let column = schema
.columns
.get(index)
.ok_or(QueryError::IndexNotFoundInSchema { index })?;
columns.push(column.clone());
}
Ok(Schema { columns })
}
Outputting everything
We still want to output everything in JSON as before, so our QueryStep
has a to_json
function that puts everything back as it was before.
impl QueryStep {
// reconstruct JSON output
pub fn to_json(&self) -> serde_json::Value {
let mut output_rows = vec![];
for row in &self.rows {
let mut output_row = serde_json::Map::new();
for column in &self.schema.columns {
let value = row.get_column(column, &self.schema).unwrap();
output_row.insert(column.to_string(), value.clone());
}
output_rows.push(serde_json::Value::Object(output_row));
}
serde_json::Value::Array(output_rows)
}
}
After this, all the outputs look the same as before and our tests all pass again. Nice!
What's next?
OK, we've got all our ducks in a row, next time we'll do the joins, I promise.
Make sense? If not, get in touch!