PA 2 - FAT Filesystem

Notes:

Instructions

Introduction & Learning Objectives

In this PA, you will build a user-space library (libfs) that implements a simplified FAT (File Allocation Table) file system. Instead of interacting with a physical hard drive, your file system will operate on a virtual disk - a standard file on your Linux environment that has been artificially divided into fixed-size 4096-byte blocks.

By the end of this assignment, you will understand:

Block-based I/O: How operating systems read and write data in fixed chunks rather than byte-by-byte.
File System Metadata: How structs are packed into blocks to track file locations, sizes, and free space.
Linked Allocation: How the File Allocation Table stitches fragmented data blocks together into contiguous files.
File Descriptors: How the OS manages state for multiple open files simultaneously.

The Virtual Disk Architecture

Our file system operates entirely on a virtual disk where every block is exactly 4096 bytes (BLOCK_SIZE). You cannot read or write 10 bytes to the disk; you must read the entire 4096-byte block into memory, modify the specific bytes you care about, and write the entire 4096-byte block back. This is handled via the provided disk.c driver.

The disk is divided into four strict regions, always in this exact order:

Superblock: Always exactly 1 block (Block 0).
FAT Array: Can span 1 or more blocks, depending on the total disk size.
Root Directory: Always exactly 1 block.
Data Region: The remaining blocks, used strictly to hold the actual contents of your files.

The Core Data Structures

The Superblock (Block 0)

The Superblock is the "master record" of the file system. When you mount the disk, this is the very first block you read. It contains the geometry of the entire disk.

typedef struct __attribute__((__packed__)) Superblock {  
    uint8_t  sig[8];           // Must be exactly "CSCE-313"  
    uint16_t numBlocks;        // Total blocks on the virtual disk  
    uint16_t rootIndex;        // Block index of the Root Directory  
    uint16_t dataIndex;        // Block index of the Data Region  
    uint16_t numDataBlocks;    // Total number of data blocks  
    uint8_t  numFAT;           // Number of blocks the FAT occupies  
    uint8_t  padding[4079];    // Pads the struct to exactly 4096 bytes  
} Superblock;

Note: The __attribute__((__packed__)) directive tells the GCC compiler not to add arbitrary padding between variables, ensuring our struct maps perfectly to the raw disk bytes.

The File Allocation Table (FAT)

When a file spans multiple blocks, the FAT functions as a linked-list data structure that maps the logical sequence of data to its physical locations on the disk.

It is an array of 16-bit integers (uint16_t).
Crucial Indexing Rule: The indices of the FAT array correspond only to the blocks in the Data Region, not the absolute blocks of the virtual disk.
FAT Index 0 corresponds to the very first data block. To calculate the absolute disk block number that you must pass into block_read() or block_write(), you must offset the index by the start of the data region using this formula: Absolute_Disk_Block = superblock.dataIndex + FAT_Index
The value stored at fat[N] is the FAT index of the next block in the file's chain.
Special Values: 0xFFFF indicates the End of Chain (EOC). 0x0000 indicates a free, unused block.

The Root Directory

The Root Directory is a flat catalog of all files on the disk. Our file system has a strict limit of 128 maximum files. A 32-byte entry represents each file.

typedef struct __attribute__((__packed__)) RootEntry {
    uint8_t  name[16];         // File name (null-terminated if < 16 chars)
    uint32_t size;             // File size in bytes
    uint16_t indexFirstBlock;  // Index of the first data block (or 0xFFFF if empty)
    uint8_t  padding[10];      // Pads the struct to exactly 32 bytes
} RootEntry;

Because 128 x 32 bytes = 4096 bytes, the Root Directory fits perfectly into a single block.

The File Descriptor Table

To allow user programs to read and write to files, the OS must keep track of active "sessions." If a user opens a file and reads the first 100 bytes, the file system must remember that the cursor is at byte 100.

You must declare a global array of 32 File Descriptors in your fs.c file: FileDescriptor fd_table[32];

You should define the FileDescriptor struct like this:

typedef struct {
  int is_used;       // 1 if this FD is currently open, 0 if it is free
  int root_index;    // The index of this file in your Root Directory array
  size_t offset;     // The current read/write cursor position in bytes
} FileDescriptor;

Why root_index? Do not store the file's name in this struct! By storing the index (0 to 127) of the file's entry in your Root Directory array, fs_read and fs_write can instantly access the file's size and starting FAT block without having to do a slow strcmp search every time the user wants to read a byte.

FAT Chain Example

image-1.png383x639

Note: This file system does not support folders or sub-directories. All files are stored in the Root Directory.

Codebase Organization

The repository is strictly organized to separate your implementation from the testing tools.

📁 CSCE-313-FAT-Lab/
├── 📄 Makefile                 # Top-level Makefile to build everything
├── 📄 autograder.sh            # The final grading script
├── 📁 src/                     # Core file system library
│   ├── 📄 disk.c / disk.h      # Provided block I/O driver (DO NOT MODIFY)
│   ├── 📄 fs.c                 # YOUR IMPLEMENTATION GOES HERE
│   ├── 📄 fs.h                 # API headers (DO NOT MODIFY)
│   └── 📄 Makefile             # Compiles your code into libfs.a
├── 📁 utils/                   # Standalone command-line tools
│   ├── 📄 fs_cli.c             # Source code for the interactive shell
│   ├── 📄 mkfs.x               # Pre-compiled virtual disk formatter
│   ├── 📄 ref_fs.x             # Pre-compiled TA reference implementation
│   └── 📄 Makefile             
└── 📁 tests/                   # C-level Integration Tests
    ├── 📄 test_integration.c   # Tests FD limits and state
    ├── 📄 test_file_io.c       # Tests read/write offset logic
    └── 📄 Makefile

What You Must Implement (The API)

You are required to implement the entire file system API inside src/fs.c. You should tackle this in four distinct steps. Do not modify src/fs.h.

1: Mounting and Unmounting

int fs_mount(const char *diskname): Open the disk using block_disk_open. Read the Superblock (Block 0), verify the CSCE-313 signature, allocate memory for your FAT and Root Directory, and read them from the disk. Return 0 on success, -1 on failure.
int fs_umount(void): Write your in-memory FAT and Root Directory back to the disk to save changes. Free allocated memory and close the disk. Return 0 on success.
int fs_info(void): Print the disk geometry (total blocks, FAT blocks, free space ratio) to the console based on the Superblock data.

2: Directory Operations

int fs_create(const char *filename): Scan the Root Directory for an empty slot. Initialize the file name, set size to 0, and set the first block to 0xFFFF.
int fs_delete(const char *filename): Find the file in the Root Directory. Traverse its FAT chain and set every corresponding FAT entry to 0x0000 to free the blocks. Clear the Root Entry. If a file is currently open, fs_delete should return -1.
int fs_ls(void): Iterate through the Root Directory and print the name, size, and starting block of all active files.

3: File Descriptors

You must maintain the global fd_table array to track the state of up to 32 open files.

fs_open(const char *filename): 1. Search the Root Directory for the requested filename. If it doesn't exist, return -1. 2. Scan your fd_table array for the first available descriptor (is_used == 0). If 32 files are already open, return -1. 3. Claim the descriptor: Set is_used = 1, set root_index to the index where you found the file in the Root Directory, and set offset = 0. 4. Return the index of the file descriptor (0-31) to the user.
fs_close(int fd): Validate that the fd is within bounds (0-31) and is currently open. If valid, simply set its is_used flag to 0.
fs_stat(int fd): Validate the fd. Use its root_index to look up the file in your Root Directory array, and return the size variable from that entry.
fs_lseek(int fd, size_t offset): Validate the fd. Check if the requested offset is larger than the file's current size. If it is, return -1 (you cannot seek past the end of a file). Otherwise, update the FD's offset variable and return 0.

Note on Sparse Files: In this lab, seeking past EOF returns an error. However, in production Linux systems, this is used to create Sparse Files. You can read how the standard lseek handles this (and the SEEK_HOLE flag) in the Linux Programmer's Manual (man 2 lseek).

4: Reading and Writing (The Bounce Buffer)

This is the most complex phase. You must translate byte-level requests into block-level disk I/O.

int fs_read(int fd, void *buf, size_t count): This function reads up to count bytes from the file into the user-provided buf, starting at the current file offset.
1. Iterative Multi-Block I/O: If count is larger than 4,096 bytes (or spans across a block boundary), your function must loop. For each block involved:
  - Identify the physical disk block using the current offset and the FAT chain.
  - Load the entire 4,096-byte disk block into a temporary bounce buffer.
  - Use memcpy to copy the relevant portion from the bounce buffer into the correct position within the user's buf.
2. State Management: After each successful read, increment the file descriptor's offset by the number of bytes actually moved.
3. Boundary Conditions: * If the user asks for more bytes than are remaining in the file (e.g., asking for 100 bytes when only 20 remain before EOF), read only the remaining 20 bytes.
  - Return Value: Return the total number of bytes successfully read. If the offset is already at EOF, return 0.
int fs_write(int fd, void *buf, size_t count): Read the block into a bounce buffer, memcpy the user's new data into it, and block_write it back to the disk. If writing past the end of the file, you must find a free block in the FAT, link it, and update the file's size in the Root Directory. Return the actual number of bytes written.
- Writes occur at the current file descriptor offset.
  - If offset < size → overwrite existing data
  - If offset == size → append

Edge Cases & Expected Error Handling

Your file system should work reliably against these specific edge cases. Unless otherwise specified, your functions should return -1 to indicate an error.

1. Invalid File Descriptors

For fs_read, fs_write, fs_close, fs_stat, and fs_lseek: If the user passes a File Descriptor index that is out of bounds (e.g., < 0 or >= 32), or passes an FD that is not currently open (is_used == 0), you must immediately return -1.

2. File Creation Limits (`fs_create`)

File Already Exists: You cannot have two files with the exact same name. If fs_create is called with a filename that already exists in the Root Directory, return -1.
Root Directory Full: If there are already 128 active files in the Root Directory, return -1.
Filename Too Long: If the provided filename exceeds 15 characters (plus the null terminator), return -1.

3. Open File Limits (`fs_open`)

Too Many Open Files: If the FD table already has 32 active sessions, return -1.
Opening the Same File Twice: For simplicity in this lab, you do not need to support independent read/write offsets for the same file. If a user calls fs_open on a file that is already marked as open in your FD table, return -1.

4. Reading Past EOF (`fs_read`)

A user might ask to read 100 bytes when only 20 bytes remain in the file. This is not a fatal error! You should read the remaining 20 bytes, update the offset, and return 20 (the actual number of bytes read).
If the user calls fs_read when their offset is already at the very end of the file, do not crash. Simply return 0 to indicate that 0 bytes were read.

5. Out of Space / Disk Full (`fs_write`)

Similar to reading, if a user asks to write 5,000 bytes, but the FAT only has one free block left (4,096 bytes), you must perform a partial write.
Write the 4,096 bytes, update the file's size and the FD offset, and return 4096. The user's application will see that the return value is smaller than what it requested and will know the disk is full.
If the disk is completely 100% full when fs_write is called, return 0.

How to Compile and Run Your Code

Build, compile, and test your code iteratively using the provided tools.

1. Compiling the Project

You can build specific parts of the project, or everything at once.

To build everything: Run make from the root directory. This compiles the core library (src/), then the utilities (utils/), and finally the tests (tests/).
To build just the library: Navigate to the source folder (cd src) and run make.

2. Formatting a Virtual Disk

Before your file system can do anything, you need a disk to operate on. Use the provided mkfs.x utility to create an empty virtual disk file.

# Creates a virtual disk named 'disk.fs' with 100 total blocks  
./utils/mkfs.x disk.fs 100

3. Interactive Testing (Command-Line Interface)

Once you have implemented the code, you can use fs_cli.x to manually interact with your virtual disk and test your logic.

# Print the disk info (Tests fs_mount and fs_info)
./utils/fs_cli.x info disk.fs

# Create a file inside the virtual disk (Tests fs_create)
./utils/fs_cli.x add disk.fs my_test_file.txt

# List all files on the virtual disk (Tests fs_ls)
./utils/fs_cli.x ls disk.fs

4. Running the C-Level Tests

Once you complete the implementation, you must test your internal state machine (File Descriptors) and offset math.

# Make sure you have a fresh disk
./utils/mkfs.x disk.fs 100

# Run the integration test (Tests fs_open, fs_close, fd limits)
./tests/test_integration.x

# Run the File I/O test (Tests fs_read, fs_write offsets)
./tests/test_file_io.x

5. The Autograder

When you believe your entire implementation is complete and robust, run the grading script from the root directory. This script stresses your file system by creating 128 files, writing large text files that cross block boundaries, and forcing disk-full edge cases.

6. Understanding the Provided Tests

Use the testing suite to verify specific parts of your architecture before running the final autograder:

test_integration.x (State & File Descriptors): Checks your fd_table logic. It verifies that you enforce the 32-file limit, properly initialize offsets, and successfully recycle File Descriptors after they are closed via fs_close.
test_file_io.x (The Bounce Buffer): Checks your block offset math. It verifies that fs_read and fs_write can correctly locate data within a block, cross block boundaries using the FAT chain, and return the exact requested byte counts without reading garbage memory.
autograder.sh (Stress & Edge Cases): The final gauntlet. It verifies the 128-file Root Directory limit, ensures fs_delete properly reclaims FAT blocks, tests >4000-byte files to force block spanning, and triggers "Disk Full" scenarios to check for safe partial writes.

Key Systems Concepts to Understand

Let’s understand a few core quirks of how C and Operating Systems handle memory and hardware.

1. Struct Padding and `attribute((packed))`

By default, the C compiler optimizes your code for the CPU, not for the hard drive. CPUs prefer to read memory in even chunks (like 4 or 8 bytes at a time, known as "word alignment"). If you create a struct with a 1-byte char followed by a 4-byte int, the compiler will secretly insert 3 bytes of invisible "padding" between them so the int starts on an even memory boundary.

Why this is dangerous: In this lab, your Superblock and RootEntry structs must map flawlessly to the raw bytes on the virtual disk. If the compiler injects secret padding, your file sizes and block indices will read garbage data from the disk.

The Solution: We use __attribute__((__packed__)) at the end of our struct definitions. This is a directive that orders the GCC compiler to aggressively pack the variables together exactly as written, byte-for-byte, disabling all hardware-optimized padding.

2. The "Bounce Buffer" (Block I/O vs. Byte I/O)

The fundamental challenge of writing a file system is bridging the gap between what the user wants and what the hardware can actually do.

The User wants byte-level access: "Read exactly 13 bytes starting at offset 50."
The Hardware only does block-level access: "I can only give you the entire 4,096-byte block."

To solve this, you must implement a Bounce Buffer. This is a temporary 4,096-byte array you allocate inside your fs_read and fs_write functions.

For Reading: You use block_read to load the entire 4KB block from the disk into your bounce buffer. Then, you use pointer arithmetic to locate the exact starting byte inside that buffer, and use memcpy to copy just the 13 bytes the user asked for into their application.
For Writing (Read-Modify-Write): You cannot just write 13 bytes to the disk, or you will overwrite the other 4,083 bytes in that block with zeros. You must first block_read the existing block into your bounce buffer, memcpy the user's 13 new bytes into the bounce buffer to overwrite the old data, and then block_write the entire mixed 4KB buffer back to the disk.

3. Pointer Arithmetic vs. Byte Math

When reading data into memory, you will frequently need to calculate memory offsets. You must be extremely careful with pointer types in C.

If you have an array representing your FAT: uint16_t *fat_array = malloc(FAT_SIZE);

And you want to jump 4,096 bytes forward in memory to read the second block of the FAT, you might be tempted to do this: fat_array + 4096

The Trap: In C, adding a number to a pointer advances it by elements, not by bytes. Because fat_array points to 16-bit integers (2 bytes each), adding 4096 actually advances the memory address by 8,192 bytes, skipping a whole block and causing a segmentation fault!

The Fix: Whenever you are doing raw byte math to find offsets, always cast your pointer to a 1-byte type (like uint8_t * or char *) first: (uint8_t *)fat_array + 4096

4. File Descriptors and "State"

A file on a disk is completely static; it has no idea if anyone is reading it. But an Operating System needs to know exactly where a user left off reading so it can give them the next chunk of text.

A File Descriptor (FD) is how the OS tracks this active "state." It is simply an index to an entry in a table maintained by your library. For every open file, your library must track:

Which file is open (its index in the Root Directory).
The offset: A cursor tracking exactly how many bytes into the file the user has currently read or written.

When a user calls fs_read(fd, buf, 10), you look at your FD table, see that their cursor is at byte 50, fetch bytes 50-59, and then update their cursor to 60. When they call fs_close(fd), you simply mark that table entry as available again.

Implementation

1. Mounting and Unmounting

The key idea is:

fs_mount loads the filesystem metadata from disk into memory.
fs_umount saves your in-memory metadata back to disk.
fs_info just reports what is currently mounted.

Your assignment says the disk layout is always:

block 0 = superblock
blocks 1 .. numFAT = FAT
block rootIndex = root directory
data region after that

Also, the superblock must contain the exact 8-byte signature "CSCE-313", the FAT is an array of uint16_t, and the root directory fits in exactly one block because 128 entries × 32 bytes = 4096 bytes.

What each function should do

fs_mount(const char *diskname)

fail if a disk is already mounted
open the disk
allocate memory for the superblock
read block 0 into it
verify the signature
allocate memory for the FAT
read all FAT blocks into memory
read the root directory block into root
initialize all file descriptors as unused
reset open-file bookkeeping

fs_umount(void)

fail if nothing is mounted
fail if any file is still open
write superblock back to block 0
write FAT blocks back to disk
write root directory back to its block
free allocated memory
reset pointers/state
close the disk

fs_info(void)

fail if nothing is mounted
print geometry using the superblock
count:
- free FAT entries = FAT entries equal to 0x0000
- free root entries = empty filenames
print the ratios

Correct implementation

int fs_mount(const char *diskname) {
    if (diskname == NULL) {
        return -malo
    }

    /* Do not allow mounting twice */
    if (sblk != NULL) {
        return -1;
    }

    /* Open the virtual disk */
    if (block_disk_open(diskname) < 0) {
        return -1;
    }

    /* Allocate and read superblock */
    sblk = malloc(sizeof(Superblock));
    if (sblk == NULL) {
        block_disk_close();
        return -1;
    }

    if (block_read(0, sblk) < 0) {
        free(sblk);
        sblk = NULL;
        block_disk_close();
        return -1;
    }

    /* Validate signature: exactly 8 bytes, no null terminator required */
    if (memcmp(sblk->sig, "CSCE-313", 8) != 0) {
        free(sblk);
        sblk = NULL;
        block_disk_close();
        return -1;
    }

    /* Allocate FAT memory: numFAT blocks, each BLOCK_SIZE bytes */
    fat = malloc(sblk->numFAT * BLOCK_SIZE);
    if (fat == NULL) {
        free(sblk);
        sblk = NULL;
        block_disk_close();
        return -1;
    }

    /* Read FAT blocks from disk: FAT starts at block 1 */
    for (int i = 0; i < sblk->numFAT; i++) {
        if (block_read(1 + i, (uint8_t *)fat + (i * BLOCK_SIZE)) < 0) {
            free(fat);
            fat = NULL;
            free(sblk);
            sblk = NULL;
            block_disk_close();
            return -1;
        }
    }

    /* Read root directory block */
    if (block_read(sblk->rootIndex, root) < 0) {
        free(fat);
        fat = NULL;
        free(sblk);
        sblk = NULL;
        block_disk_close();
        return -1;
    }

    /* Initialize file descriptor table */
    for (int i = 0; i < FS_OPEN_MAX_COUNT; i++) {
        filedes[i].id = -1;
        filedes[i].offset = 0;
        filedes[i].rootIndex = -1;
    }

    numFilesOpen = 0;
    next_id = 0;

    return 0;
}

int fs_umount(void) {
    if (sblk == NULL) {
        return -1;
    }

    /* Cannot unmount if files are still open */
    if (numFilesOpen > 0) {
        return -1;
    }

    /* Write superblock back */
    if (block_write(0, sblk) < 0) {
        return -1;
    }

    /* Write FAT blocks back */
    for (int i = 0; i < sblk->numFAT; i++) {
        if (block_write(1 + i, (uint8_t *)fat + (i * BLOCK_SIZE)) < 0) {
            return -1;
        }
    }

    /* Write root directory back */
    if (block_write(sblk->rootIndex, root) < 0) {
        return -1;
    }

    /* Free memory */
    free(fat);
    fat = NULL;

    free(sblk);
    sblk = NULL;

    /* Close disk */
    if (block_disk_close() < 0) {
        return -1;
    }

    return 0;
}

int fs_info(void) {
    if (sblk == NULL) {
        return -1;
    }

    int fat_free = 0;
    int root_free = 0;

    /* Count free FAT entries */
    for (int i = 0; i < sblk->numDataBlocks; i++) {
        if (fat[i] == get_fat_free()) {
            fat_free++;
        }
    }

    /* Count free root directory entries */
    for (int i = 0; i < FS_FILE_MAX_COUNT; i++) {
        if (root[i].name[0] == '\0') {
            root_free++;
        }
    }

    printf("FS Info:\n");
    printf("total_blk_count=%u\n", sblk->numBlocks);
    printf("fat_blk_count=%u\n", sblk->numFAT);
    printf("rdir_blk=%u\n", sblk->rootIndex);
    printf("data_blk=%u\n", sblk->dataIndex);
    printf("data_blk_count=%u\n", sblk->numDataBlocks);
    printf("fat_free_ratio=%d/%u\n", fat_free, sblk->numDataBlocks);
    printf("rdir_free_ratio=%d/%d\n", root_free, FS_FILE_MAX_COUNT);

    return 0;
}

`fs_mount` questions

1. Why do we use `sblk != NULL` to check if mounted?

Because sblk is a global pointer:

static Superblock* sblk = NULL;

Before mounting, it is NULL.

After successful mounting, it points to allocated memory.

So this:

if (sblk != NULL) return -1;

means:

“A filesystem is already mounted, so do not mount another one on top of it.”

That prevents inconsistent state.

2. Why do we `malloc(sizeof(Superblock))`?

Because the superblock lives on disk, but after mounting we want a copy in RAM.

So:

sblk = malloc(sizeof(Superblock));
block_read(0, sblk);

means:

reserve memory for one superblock
read disk block 0 into that memory

This matches the assignment’s description that block 0 is always the superblock.

3. Why use `memcmp(sblk->sig, "CSCE-313", 8)`?

Because sig is:

uint8_t sig[8];

That is exactly 8 bytes.

And "CSCE-313" is also 8 characters.

Important detail:

this field is not necessarily null-terminated, so strcmp() is not the right choice.

memcmp(..., 8) compares exactly 8 raw bytes, which is what we want.

4. Why is FAT allocation `sblk->numFAT * BLOCK_SIZE`?

The FAT occupies multiple whole disk blocks, and each block is 4096 bytes. The assignment explicitly says the FAT can span 1 or more blocks, and the disk only works in whole blocks.

So if numFAT == 2, then FAT size in bytes is:

2 * 4096

That is why we allocate:

fat = malloc(sblk->numFAT * BLOCK_SIZE);

5. Why cast to `(uint8_t *)fat` when reading FAT blocks?

This is one of the most important C details in the whole assignment.

You are reading the FAT one disk block at a time:

block_read(1 + i, (uint8_t *)fat + (i * BLOCK_SIZE))

Why not just do fat + i * BLOCK_SIZE?

Because fat is a uint16_t *.

In C, pointer arithmetic moves by elements, not bytes.

So:

fat + 1

moves forward by sizeof(uint16_t) bytes, which is 2 bytes.

But here we want to move by exact byte offsets, like 4096 bytes at a time. The assignment itself warns about this pointer-arithmetic trap and says to cast to a 1-byte type first.

So:

(uint8_t *)fat + (i * BLOCK_SIZE)

means:

“Treat FAT as raw bytes, then move forward exactly i * 4096 bytes.”

6. Why do we read root directly into root?

Because your code already has:

static RootEntry root[FS_FILE_MAX_COUNT];

And the assignment says the root directory is exactly one block. Since each entry is 32 bytes and there are 128 entries, that is exactly 4096 bytes total.

So this works perfectly:

block_read(sblk->rootIndex, root);

We are reading one block straight into the array.

7. Why initialize all file descriptors during mount?

Because open-file state should start clean each time you mount.

In your starter code, a descriptor is unused when:

filedes[i].id == -1

So we set all of them to unused.

Also:

numFilesOpen = 0;
next_id = 0;

resets the session bookkeeping.

`fs_umount` questions

8. Why check `numFilesOpen > 0 in fs_umount`?

Because unmounting while files are still open would mean:

user processes still think those files are active
but your metadata is being shut down and freed

That creates invalid state.

Your assignment explicitly says to ensure no files are open before unmounting.

9. Why write metadata back in `fs_umount`?

Because while the disk is mounted, you will modify:

FAT
root directory
maybe even superblock in future logic

Those changes live in memory until you save them.

So unmount is the “flush everything back to disk” step.

`fs_info` questions

10. Why count FAT free entries only up to numDataBlocks?

Because FAT entries correspond to the data region only, not all disk blocks. The assignment is very explicit about that.

So the valid FAT entries are:

fat[0] ... fat[numDataBlocks - 1]

Not the entire disk block count.

11. Count free FAT entries

for (int i = 0; i < sblk->numDataBlocks; i++) {
    if (fat[i] == get_fat_free()) {
        fat_free_count++;
    }
}

This loops through the FAT entries that actually correspond to data blocks.

A FAT entry is free if it contains:

0x0000

which your helper returns through:

get_fat_free()

So we are literally counting:

“How many data blocks are currently unused?”

12. Count fee root entries

for (int i = 0; i < FS_FILE_MAX_COUNT; i++) {
    if (root[i].name[0] == '\0') {
        root_free_count++;
    }
}

Each root directory entry represents one possible file.

If the first character of the filename is '\0', that means the entry is empty.

So this counts:

“How many file slots are still available in the root directory?”

Since the root directory has room for 128 files, this loop checks all 128 entries.

Why root[i].name[0] == '\0' means empty:
A filename is stored in:

uint8_t name[FS_FILENAME_LEN];

If a root entry is unused, its name should be all zeros, or at least start with zero.

So checking only the first byte is enough to tell whether the entry is empty.

That is a very common pattern in C strings.

2. Directory Operations

`fs_create` implementation

Creating a file here does not allocate any data block yet.

That is a really important point.

When you create an empty file:

it has a name
it has size 0
it has no data blocks yet

So the first block is set to the FAT end-of-chain marker:

0xFFFF

This means:

“There is currently no first block allocated.”

What checks should happen first

Before writing anything into the root directory, fs_create should verify:

a filesystem is mounted
filename is not NULL
filename is not empty
filename fits in FS_FILENAME_LEN (16 total, so max 15 visible characters + '\0')
the file does not already exist
there is at least one empty root directory slot

The assignment explicitly requires duplicate-name rejection, root-full rejection, and filename-length rejection.

Correct implementation:

int fs_create(const char *filename) {
    if (sblk == NULL || filename == NULL) {
        return -1;
    }

    /* Reject empty filename */
    if (filename[0] == '\0') {
        return -1;
    }

    /* Max allowed: 15 chars + null terminator */
    if (strlen(filename) >= FS_FILENAME_LEN) {
        return -1;
    }

    int free_index = -1;

    for (int i = 0; i < FS_FILE_MAX_COUNT; i++) {
        /* Track first empty root slot */
        if (root[i].name[0] == '\0') {
            if (free_index == -1) {
                free_index = i;
            }
        }
        /* Reject duplicate filename */
        else if (strcmp((char *)root[i].name, filename) == 0) {
            return -1;
        }
    }

    /* No empty slot found => root directory full */
    if (free_index == -1) {
        return -1;
    }

    /* Initialize the new root entry */
    memset(&root[free_index], 0, sizeof(RootEntry));
    strcpy((char *)root[free_index].name, filename);
    root[free_index].size = 0;
    root[free_index].indexFirstBlock = get_fat_eoc();

    return 0;
}

1. Mounted check

if (sblk == NULL || filename == NULL) {
    return -1;
}

If the filesystem is not mounted, then root[] is not meaningful as active filesystem state.

Also, if filename == NULL, you cannot safely call strlen or strcmp on it.

2. Empty filename check

if (filename[0] == '\0') {
    return -1;
}

A zero-length file name should be rejected.

Remember empty names would conflict with how you detect unused root entries:

root[i].name[0] == '\0'

So allowing "" as a real file name would break your directory logic.

3. Length check

if (strlen(filename) >= FS_FILENAME_LEN) {
    return -1;
}

Your header defines:

#define FS_FILENAME_LEN 16

That means the array can hold at most:

15 actual filename characters
1 null terminator

So length 16 or more is too long.

That matches the assignment’s rule: filenames longer than 15 characters plus null terminator must be rejected.

4. One scan handles both duplicate detection and free-slot detection

This part is efficient and clean:

int free_index = -1;

for (int i = 0; i < FS_FILE_MAX_COUNT; i++) {
    if (root[i].name[0] == '\0') {
        if (free_index == -1) {
            free_index = i;
        }
    }
    else if (strcmp((char *)root[i].name, filename) == 0) {
        return -1;
    }
}

During the loop:

if an entry is empty, remember the first empty slot
if an entry is in use, compare its name to the requested filename
if duplicate found, fail immediately

At the end:

if no empty slot exists, the root directory is full
otherwise use the remembered slot

This satisfies both assignment requirements: no duplicates, no creation when the root directory is full.

5. Why `root[i].name[0] == '\0'` means “empty slot”

Because unused root entries are effectively blank.

A filename stored in a C-style char array starts with '\0' if it is empty.

So this is the standard way to say:

“This root directory entry is unused.”

6. Why `memset` first is a good idea

memset(&root[free_index], 0, sizeof(RootEntry));

This clears the entire entry before filling in fields.

That means:

name[] becomes all zero
size becomes 0
indexFirstBlock becomes 0 initially
padding bytes become clean too

Then you overwrite the fields that matter:

strcpy((char *)root[free_index].name, filename);
root[free_index].size = 0;
root[free_index].indexFirstBlock = get_fat_eoc();

Using memset is not strictly required, but it is very safe and clean.

7. Why `indexFirstBlock = get_fat_eoc()`

This is one of the most important ideas in fs_create.

A newly created file has no data blocks yet.

So its “first block” is not a real FAT index.

Instead, you mark it as:

0xFFFF

which is your end-of-chain marker:

get_fat_eoc()

Later, when fs_write writes the first byte into an empty file, that is when you will allocate the first FAT block.

Why we do not call `find_free_fat()` here

This is a very common confusion.

You might think:

“I am creating a file, so shouldn’t I allocate a block now?”

But for this filesystem, no.

Creating the file only creates its directory entry.

The file is still empty, so it does not need any data block yet.

You only allocate a FAT block when the file actually needs to store data, usually during fs_write.

So fs_create does not touch FAT at all.

Common mistake

Mistake 1: using > instead of >= in length check

Wrong:

if (strlen(filename) > FS_FILENAME_LEN)

That would incorrectly allow a 16-character name, which leaves no space for the null terminator.

Correct:

if (strlen(filename) >= FS_FILENAME_LEN)

`fs_delete` implement

Big picture first

When a file exists, its root directory entry stores:

its name
its size
the FAT index of its first data block

If the file has data, the FAT works like a linked list:

first block -> next block -> next block -> ... -> 0xFFFF

Deleting the file means:

find the first block from the root entry
follow the chain in the FAT
free each FAT entry by setting it to 0x0000
erase the root directory entry so the file no longer exists

One very important edge case

A file may be empty.

For an empty file:

indexFirstBlock == 0xFFFF

That means:

there is no actual FAT chain to free
you should skip the FAT traversal
you should still clear the root entry

This is because fs_create initializes empty files with indexFirstBlock = 0xFFFF.

Implementation

Correct implementation:

int fs_delete(const char *filename) {
    if (sblk == NULL || filename == NULL) {
        return -1;
    }

    int root_index = -1;

    /* Find the file in the root directory */
    for (int i = 0; i < FS_FILE_MAX_COUNT; i++) {
        if (root[i].name[0] != '\0' &&
            strcmp((char *)root[i].name, filename) == 0) {
            root_index = i;
            break;
        }
    }

    /* File not found */
    if (root_index == -1) {
        return -1;
    }

    /* Make sure the file is not currently open */
    for (int i = 0; i < FS_OPEN_MAX_COUNT; i++) {
        if (filedes[i].id != -1 && filedes[i].rootIndex == root_index) {
            return -1;
        }
    }

    /* Free all FAT blocks belonging to the file */
    uint16_t curr = root[root_index].indexFirstBlock;

    while (curr != get_fat_eoc()) {
        uint16_t next = fat[curr];
        fat[curr] = get_fat_free();
        curr = next;
    }

    /* Clear the root entry */
    memset(&root[root_index], 0, sizeof(RootEntry));
    root[root_index].indexFirstBlock = get_fat_eoc();

    return 0;
}

1. Validate basic input

if (sblk == NULL || filename == NULL) {
    return -1;
}

This makes sure:

the filesystem is mounted
the filename pointer is valid

Without this, you could crash or operate on invalid state.

2. Find the file in the root directory

int root_index = -1;

for (int i = 0; i < FS_FILE_MAX_COUNT; i++) {
    if (root[i].name[0] != '\0' &&
        strcmp((char *)root[i].name, filename) == 0) {
        root_index = i;
        break;
    }
}

This loop searches through all 128 root directory entries.

We only compare names for entries that are actually in use:

root[i].name[0] != '\0'

If found, we store the index of the file’s root entry.

That index is important because later functions often refer back to the file through its root directory slot. The assignment specifically explains that the file descriptor table stores root_index for this reason.

3. If the file does not exist, fail

if (root_index == -1) {
    return -1;
}

If you never found a matching filename, there is nothing to delete.

4. Check whether the file is open

for (int i = 0; i < FS_OPEN_MAX_COUNT; i++) {
    if (filedes[i].id != -1 && filedes[i].rootIndex == root_index) {
        return -1;
    }
}

This is very important.

Your assignment explicitly says:

If a file is currently open, fs_delete should return -1.

Since your codebase uses:

filedes[i].id == -1

to mean “unused FD”, this loop checks all currently open descriptors and sees whether any of them refer to this file’s root directory entry.

If yes, deletion is not allowed.

Why check by rootIndex instead of name?

Because your file descriptor table stores: rootIndex
which is the index of the file in the root directory.
That is better than storing the name because:
- it is faster
- it avoids repeated string comparisons
- it matches the assignment’s intended design

5. Start FAT traversal

uint16_t curr = root[root_index].indexFirstBlock;

This gets the FAT index of the first data block of the file.

There are two possibilities:

Case A: file is empty
Then:

curr == 0xFFFF

So the loop below never runs.

Case B: file has data

Then curr is some valid FAT index like 0, 5, 19, etc.

6. Walk the FAT chain and free each block

while (curr != get_fat_eoc()) {
    uint16_t next = fat[curr];
    fat[curr] = get_fat_free();
    curr = next;
}

This is the heart of the delete logic.

Let’s unpack it carefully.

Why do we save next before freeing the current block?

Suppose the chain is:

fat[4] = 7
fat[7] = 10
fat[10] = 0xFFFF

That means the file uses blocks:

4 -> 7 -> 10 -> EOC

If you did this in the wrong order:

fat[curr] = 0x0000;
curr = fat[curr];

you would immediately lose the next pointer, because after setting fat[curr] = 0x0000, the chain is broken.

So the correct order is:

remember next
free current
move to next

That is why we do:

uint16_t next = fat[curr];
fat[curr] = get_fat_free();
curr = next;

Example traversal

Suppose:

indexFirstBlock = 4

and the FAT contains:

fat[4]  = 7
fat[7]  = 10
fat[10] = 0xFFFF

Then the loop does:

Iteration 1

curr = 4
next = fat[4] = 7
fat[4] = 0x0000
curr = 7

Iteration 2

curr = 7
next = fat[7] = 10
fat[7] = 0x0000
curr = 10

Iteration 3

curr = 10
next = fat[10] = 0xFFFF
fat[10] = 0x0000
curr = 0xFFFF

Now loop ends.

All three blocks are freed.

This matches the assignment’s requirement to traverse the FAT chain and mark each block as free.

7. Clear the root entry

memset(&root[root_index], 0, sizeof(RootEntry));
root[root_index].indexFirstBlock = get_fat_eoc();

This removes the file from the directory.

After memset, the root entry becomes blank:

name becomes empty
size becomes 0
all padding becomes 0

Then we explicitly set:

indexFirstBlock = 0xFFFF

This is a nice consistent state for an empty/nonexistent file entry.

Strictly speaking, after memset it would be 0, but setting it back to EOC keeps the semantic meaning consistent with how empty files are represented elsewhere.

Deleting a file is not just about freeing data blocks.

If you only freed the FAT chain but left the root entry intact, the filesystem would still think the file exists.

So deletion must do both:

reclaim the data blocks
remove the directory entry

`fs_ls` implementation

What `fs_ls` should do

It should:

make sure a filesystem is mounted
loop through all 128 root directory entries
print only the entries that are active
for each active file, print:
filename
size
first data block

An entry is active if its name is not empty.

Implementation

Correct implementation:

int fs_ls(void) {
    if (sblk == NULL) {
        return -1;
    }
	
    printf("FS Ls:\n");
	
    for (int i = 0; i < FS_FILE_MAX_COUNT; i++) {
        if (root[i].name[0] != '\0') {
            printf("file: %s, size: %u, data_blk: %u\n",
                   root[i].name,
                   root[i].size,
                   root[i].indexFirstBlock);
        }
    }
	
    return 0;
}

1. Check whether the filesystem is mounted

if (sblk == NULL) {
    return -1;
}

If no filesystem is mounted, there is no valid root directory to list.

So just like your other functions, fs_ls should fail in that case.

2. Print a header

printf("FS Ls:\n");

This is just a label before the file list.

Many versions of this assignment use exactly this style.

3. Loop through the whole root directory

for (int i = 0; i < FS_FILE_MAX_COUNT; i++)

Your root directory has room for 128 entries, so this checks all possible file slots.

4. Skip empty entries

if (root[i].name[0] != '\0')

This means:

if the first character is not null, the file exists
if it is null, that slot is unused

So fs_ls only prints real files.

5. Print file information

printf("file: %s, size: %u, data_blk: %u\n",
       root[i].name,
       root[i].size,
       root[i].indexFirstBlock);

This prints exactly the three things the assignment asks for:

name
size
starting block

Important note about data_block

root[i].indexFirstBlock

This is the FAT index of the first block, not necessarily the absolute disk block number.

That is an important distinction.

For example:

if indexFirstBlock = 0, that means the first data block in the data region
the actual disk block would be:

sblk->dataIndex + root[i].indexFirstBlock

But for this function, most versions of the assignment want the stored starting block value from the root entry, which is the FAT index.

Also, for an empty file, this value will be:

0xFFFF

because empty files do not yet have any allocated data block.

3. File Descriptors

`fs_open()` implementation

What `fs_open` must do

It should:

make sure the filesystem is mounted
validate filename
find the file in the root directory
fail if the file does not exist
fail if that file is already open
find the first free file descriptor slot
fail if all 32 are in use
initialize that FD:
- assign an ID
- set offset = 0
- set rootIndex to the root directory entry index
increment open-file bookkeeping
return the FD index

Implementation

Correct implementation:

int fs_open(const char *filename) {
    if (sblk == NULL || filename == NULL) {
        return -1;
    }

    int root_index = -1;

    /* Find the file in the root directory */
    for (int i = 0; i < FS_FILE_MAX_COUNT; i++) {
        if (root[i].name[0] != '\0' &&
            strcmp((char *)root[i].name, filename) == 0) {
            root_index = i;
            break;
        }
    }

    /* File does not exist */
    if (root_index == -1) {
        return -1;
    }

    /* Do not allow opening the same file twice */
    for (int i = 0; i < FS_OPEN_MAX_COUNT; i++) {
        if (filedes[i].id != -1 && filedes[i].rootIndex == root_index) {
            return -1;
        }
    }

    /* Find first free file descriptor slot */
    for (int i = 0; i < FS_OPEN_MAX_COUNT; i++) {
        if (filedes[i].id == -1) {
            filedes[i].id = next_id++;
            filedes[i].offset = 0;
            filedes[i].rootIndex = root_index;
            numFilesOpen++;
            return i;
        }
    }

    /* No free FD slot */
    return -1;
}

Search the root directory for the file

int root_index = -1;

for (int i = 0; i < FS_FILE_MAX_COUNT; i++) {
    if (root[i].name[0] != '\0' &&
        strcmp((char *)root[i].name, filename) == 0) {
        root_index = i;
        break;
    }
}

This loop looks through all 128 root entries.

We only compare non-empty entries:

root[i].name[0] != '\0'

If we find a matching name, we save that root directory index.

That is important because the file descriptor table does not store the filename itself. It stores the index of the file’s root directory entry, which is exactly the design the assignment recommends.

3. Fail if the file does not exist

if (root_index == -1) {
    return -1;
}

If you did not find a matching file in the root directory, then fs_open must fail.

The assignment explicitly says that if the file does not exist, return -1.

4. Reject opening the same file twice

for (int i = 0; i < FS_OPEN_MAX_COUNT; i++) {
    if (filedes[i].id != -1 && filedes[i].rootIndex == root_index) {
        return -1;
    }
}

This is an important lab-specific rule.

Your assignment says:

For simplicity in this lab, you do not need to support independent read/write offsets for the same file. If a user calls fs_open on a file that is already marked as open in your FD table, return -1.

So we scan the FD table and check whether any currently open descriptor already refers to that same root directory entry.

If yes, fail.

5. Find the first free file descriptor slot

for (int i = 0; i < FS_OPEN_MAX_COUNT; i++) {
    if (filedes[i].id == -1) {
        ...
    }
}

Your code uses:

id == -1

to mean “unused FD slot”.

So this loop is looking for the first available descriptor index from 0 to 31.

6. Initialize the file descriptor

filedes[i].id = next_id++;
filedes[i].offset = 0;
filedes[i].rootIndex = root_index;
numFilesOpen++;
return i;

This claims the FD slot.

filedes[i].id = next_id++;
- This gives the session a unique ID and marks the slot as in use.
- Even though the user sees the FD index i, your internal id is what your starter code uses to distinguish free from occupied entries.
filedes[i].offset = 0;
- This means when the file is opened, the current read/write cursor starts at the beginning of the file.
- That exactly matches the assignment: newly opened files should start with offset 0.
filedes[i].rootIndex = root_index;
- This links the FD slot to the file’s entry in the root directory.
- Later, fs_stat, fs_read, fs_write, and fs_lseek will use this to quickly find:
  - file size
  - first block
  - file metadata
- without doing another filename search.
numFilesOpen++;
- This keeps your open-file count accurate.
- That matters because fs_umount checks whether any files are still open before unmounting.
return i;
- This returns the file descriptor index, which is what the assignment expects.
- The assignment specifically says fs_open should return the descriptor index from 0 to 31.

Why return `i` instead of `filedes[i].id`?

This is a very important distinction.

There are two numbers here:

filedes[i].id = internal session marker
i = the actual file descriptor index the user uses

All later API calls like:

fs_close(fd)
fs_read(fd, ...)
fs_write(fd, ...)

expect the array index fd, not the internal session ID.

So the correct return value is:

return i;

not the unique ID

Think of the FD table like this:

FD index	id	rootIndex
0	-1	-1
1	-1	-1
2	-1	-1

Suppose notes.txt is in root entry 7.

fd = fs_open("notes.txt");

might produce:

FD index	id	rootIndex
0	0	7
1	-1	-1
2	-1	-1

and return:

meaning the user’s file descriptor is 0.

Later, if the user reads 100 bytes, offset may become 100.

`fs_close` implementation

What `fs_close` must do

From the instructions:

check that fd is valid (0–31)
check that it is currently open
mark it as unused

In your implementation, “unused” means:

filedes[i].id == -1

Implementation

Correct implementation:

int fs_close(int fd) {
    /* Validate fd */
    if (!is_valid_fd(fd)) {
        return -1;
    }

    /* Mark descriptor as unused */
    filedes[fd].id = -1;
    filedes[fd].offset = 0;
    filedes[fd].rootIndex = -1;

    numFilesOpen--;

    return 0;
}

1. Validate the file descriptor

You already have this helper:

static inline int is_valid_fd(int fd) {
    return (fd >= 0 && fd < FS_OPEN_MAX_COUNT && filedes[fd].id != -1);
}

So:

if (!is_valid_fd(fd)) {
    return -1;
}

means:

fd must be between 0 and 31
AND the FD must currently be in use

If either fails → return -1

2. Mark the FD as unused

filedes[fd].id = -1;

This is the most important line.

Because in your system:

id == -1  → unused

That is what makes the FD slot available again.

Mental model

Before closing:

FD	id	offset	rootIndex
0	5	120	3

After:

FD	id	offset	rootIndex
0	-1	0	-1

`fs_stat` implementation

What `fs_stat` should do

Given a file descriptor fd, it should:

validate the FD
use filedes[fd].rootIndex to find the file in the root directory
return the file’s size

Implementation

Correct implementation:

int fs_stat(int fd) {
    /* Validate fd */
    if (!is_valid_fd(fd)) {
        return -1;
    }

    int root_index = filedes[fd].rootIndex;

    return root[root_index].size;
}

Get the root directory index

int root_index = filedes[fd].rootIndex;

This is the key design idea of your filesystem:

The FD does not store the filename
It stores the index of the file in the root directory

So instead of searching by name again, you directly jump to:

root[root_index]

`fs_lseek` implementation

What `fs_lseek` must do

Given:

a file descriptor fd
a desired offset

It should:

validate fd
get the file size from the root directory
check:
- if offset > size → invalid → return -1
otherwise:
- update filedes[fd].offset = offset
- return 0

Implementation

Correct implementation:

int fs_lseek(int fd, size_t offset) {
    /* Validate fd */
    if (!is_valid_fd(fd)) {
        return -1;
    }

    int root_index = filedes[fd].rootIndex;
    size_t file_size = root[root_index].size;

    /* Cannot seek past end of file */
    if (offset > file_size) {
        return -1;
    }

    /* Update offset */
    filedes[fd].offset = offset;

    return 0;
}

Enforce the key rule

if (offset > file_size) {
    return -1;
}

This is the most important rule.

The assignment explicitly says:

you cannot seek past the end of a file

So:

offset	size	allowed?
< size	✅	valid
= size	✅	valid
> size	❌	invalid

Why offset == size is allowed

This is subtle but important.

If:

offset == size

That means:

cursor is at end of file (EOF)

This is valid, because:

next read → returns 0
next write → appends

Mental model

Think of the file like a byte array:

[0][1][2][3][4][5][6][7][8][9]

If size = 10:

valid offsets: 0 → 10
invalid offsets: > 10

Example

fd = fs_open("file.txt");  // size = 10

fs_lseek(fd, 5);   // OK
fs_lseek(fd, 10);  // OK (EOF)
fs_lseek(fd, 11);  // FAIL

4. Reading and Writing (The Bounce Buffer)

`fs_read` implementation

What fs_read does

Given:

fd
user buffer buf
count

it:

checks how many bytes are actually available before EOF
loops block by block
reads each full 4096-byte disk block into a bounce buffer
copies only the needed part into the user buffer
advances the file descriptor offset
returns the number of bytes actually read

Implementation

Correct implementation:

int fs_read(int fd, void *buf, size_t count) {
    if (!is_valid_fd(fd) || buf == NULL) {
        return -1;
    }

    if (count == 0) {
        return 0;
    }

    int r_idx = filedes[fd].rootIndex;
    size_t file_size = root[r_idx].size;
    size_t cur_offset = filedes[fd].offset;

    /* Already at EOF */
    if (cur_offset >= file_size) {
        return 0;
    }

    /* Do not read past EOF */
    size_t bytes_left_in_file = file_size - cur_offset;
    size_t bytes_to_read = (count < bytes_left_in_file) ? count : bytes_left_in_file;

    size_t total_read = 0;
    uint8_t bounce[BLOCK_SIZE];

    while (total_read < bytes_to_read) {
        size_t abs_offset = cur_offset + total_read;

        int disk_block = get_block_for_offset(fd, abs_offset, 0);
        if (disk_block < 0) {
            return -1;
        }

        if (block_read(disk_block, bounce) < 0) {
            return -1;
        }

        size_t block_offset = abs_offset % BLOCK_SIZE;
        size_t bytes_left_in_block = BLOCK_SIZE - block_offset;
        size_t bytes_remaining = bytes_to_read - total_read;

        size_t chunk = (bytes_remaining < bytes_left_in_block)
                     ? bytes_remaining
                     : bytes_left_in_block;

        memcpy((uint8_t *)buf + total_read, bounce + block_offset, chunk);
        total_read += chunk;
    }

    filedes[fd].offset += total_read;
    return (int)total_read;
}

1. Validate inputs

if (!is_valid_fd(fd) || buf == NULL) {
    return -1;
}

Your assignment says invalid FDs must return -1.

And buf == NULL is also invalid because there is nowhere to copy data.

2. Handle count == 0

if (count == 0) {
    return 0;
}

Reading zero bytes is valid. It just moves nothing.

3. Check EOF first

if (cur_offset >= file_size) {
    return 0;
}

The assignment explicitly says:

if the offset is already at EOF, return 0
do not crash

So if the cursor is already at the end, there is nothing to read.

4. Clamp the read to EOF

size_t bytes_left_in_file = file_size - cur_offset;
size_t bytes_to_read = (count < bytes_left_in_file) ? count : bytes_left_in_file;

This is the line that enforces:

“Read only what remains before EOF.”

Example:

file size = 1000
current offset = 990
user asks for 50 bytes

Then:

only 10 bytes remain
so bytes_to_read = 10

5. Why we need a bounce buffer

The assignment explains that the disk only reads whole 4096-byte blocks, even if the user wants only a few bytes. So you must:

read the entire block into a temporary array
copy just the needed bytes from that array into the user buffer

That is why we do:

uint8_t bounce[BLOCK_SIZE];

and later:

block_read(disk_block, bounce);

6. Compute which block to read

size_t abs_offset = cur_offset + total_read;
int disk_block = get_block_for_offset(fd, abs_offset, 0);

As the loop progresses, abs_offset moves through the file.

Examples:

if current file offset is 3000 and you already read 0 bytes, abs_offset = 3000
after reading 1096 bytes, abs_offset = 4096
now you need the next file block

That is exactly why the helper exists.

7. Find the position inside the block

size_t block_offset = abs_offset % BLOCK_SIZE;

This tells you where inside the 4096-byte block your read starts.

Examples:

abs_offset = 3000 → block_offset = 3000
abs_offset = 4096 → block_offset = 0
abs_offset = 5000 → block_offset = 904

So this is the byte position inside the bounce buffer.

8. Decide how many bytes to copy from this block

size_t bytes_left_in_block = BLOCK_SIZE - block_offset;
size_t bytes_remaining = bytes_to_read - total_read;

size_t chunk = (bytes_remaining < bytes_left_in_block)
             ? bytes_remaining
             : bytes_left_in_block;

This chooses the smaller of:

how many bytes are still needed overall
how many bytes remain in the current block

So if a read spans multiple blocks, each loop copies only the chunk that belongs to the current block.

9. Copy from bounce buffer into user buffer

memcpy((uint8_t *)buf + total_read, bounce + block_offset, chunk);

This says:

destination = current place inside the user buffer
source = current place inside the bounce buffer
number of bytes = chunk

This is exactly the assignment’s bounce-buffer design.

10. Update state

total_read += chunk;
...
filedes[fd].offset += total_read;

The assignment says to update the FD offset by the number of bytes actually moved.

So when the read is done, the cursor advances.

Example walk-through

Suppose:

block size = 4096
file size = 5000
FD offset = 4000
user asks for count = 300

Then:
Step 1: clamp to EOF

Bytes left in file = 5000 - 4000 = 1000
So read request is fine: bytes_to_read = 300

Step 2: first iteration

abs_offset = 4000
block_offset = 4000 % 4096 = 4000
bytes left in current block = 4096 - 4000 = 96
So first chunk = 96 bytes

Step 3: second iteration

Now total_read = 96
abs_offset = 4096
block_offset = 0
remaining bytes = 300 - 96 = 204
So second chunk = 204 bytes

Done.

That is how one logical read request turns into two block-level operations.

`fs_read` implementation

What `fs_write` must do

Given:

fd
user buffer buf
count

it should:

validate input
write starting at the current FD offset
overwrite if offset is inside the file
append if offset is at EOF
allocate FAT blocks if the write extends the file
use a bounce buffer for each affected disk block
stop early if disk becomes full
update:
- FD offset
- file size
return actual bytes written

The key difference from `fs_read`

In fs_read, the flow was:

disk block -> bounce buffer -> user buffer

In fs_write, it is:

disk block -> bounce buffer -> modify part of it -> write bounce buffer back to disk

That is called read-modify-write.

Why do we do that?

Because the disk only reads and writes whole 4096-byte blocks. If the user wants to write 20 bytes into the middle of a block, you cannot just write those 20 bytes directly to disk or you would destroy the rest of the block.

So the pattern is:

read full block into bounce buffer
overwrite the needed bytes in memory
write full block back

One subtle but important rule

When extending a file, a block might be newly allocated and never written before.

In that case, there is no old block content that matters.

So for newly allocated blocks, it is safest to initialize the bounce buffer to zeros before copying user data into it.

For existing blocks, do a normal block_read() first.

Implementation

Correct implementation:

int fs_write(int fd, void *buf, size_t count) {
    if (!is_valid_fd(fd) || buf == NULL) {
        return -1;
    }

    if (count == 0) {
        return 0;
    }

    int r_idx = filedes[fd].rootIndex;
    size_t cur_offset = filedes[fd].offset;
    size_t old_size = root[r_idx].size;
    size_t total_written = 0;
    uint8_t bounce[BLOCK_SIZE];

    while (total_written < count) {
        size_t abs_offset = cur_offset + total_written;
        size_t block_offset = abs_offset % BLOCK_SIZE;
        size_t bytes_left_in_block = BLOCK_SIZE - block_offset;
        size_t bytes_remaining = count - total_written;
        size_t chunk = (bytes_remaining < bytes_left_in_block)
                     ? bytes_remaining
                     : bytes_left_in_block;

        int disk_block = get_block_for_offset(fd, abs_offset, 1);

        /* Disk full: if nothing written yet, return 0; otherwise partial write */
        if (disk_block < 0) {
            break;
        }

        /*
         * Determine whether this block already existed in the file before
         * this write began. If yes, read old contents. If no, start with zeros.
         *
        if (abs_offset < old_size) {
            if (block_read(disk_block, bounce) < 0) {
                return -1;
            }
        } else {
            memset(bounce, 0, BLOCK_SIZE);
        }

        memcpy(bounce + block_offset, (uint8_t *)buf + total_written, chunk);

        if (block_write(disk_block, bounce) < 0) {
            return -1;
        }

        total_written += chunk;
    }

    /* Update FD offset by actual bytes written */
    filedes[fd].offset += total_written;

    /* Grow file size if needed */
    if (filedes[fd].offset > root[r_idx].size) {
        root[r_idx].size = filedes[fd].offset;
    }

    return (int)total_written;
}

1. Validate input

if (!is_valid_fd(fd) || buf == NULL) {
    return -1;
}

This matches the general FD error-handling rule:

invalid FD → -1
null buffer → invalid

2. Handle zero-byte write

if (count == 0) {
    return 0;
}

Writing zero bytes is valid and should do nothing.

3. Save starting state

int r_idx = filedes[fd].rootIndex;
size_t cur_offset = filedes[fd].offset;
size_t old_size = root[r_idx].size;
size_t total_written = 0;
uint8_t bounce[BLOCK_SIZE];

These variables matter a lot.

cur_offset
- This is where the write starts.
old_size
- This is the file size before the write begins.
- We save it because during the write, the file may grow, and we still want to know whether a block existed before or is newly allocated.
total_written
- Tracks progress through the user’s buffer.

4. Loop block by block

while (total_written < count)

This is necessary because:

the write may span multiple blocks
each disk I/O operation only handles one block at a time

5. Compute where we are in the file

size_t abs_offset = cur_offset + total_written;
size_t block_offset = abs_offset % BLOCK_SIZE;
size_t bytes_left_in_block = BLOCK_SIZE - block_offset;
size_t bytes_remaining = count - total_written;
size_t chunk = (bytes_remaining < bytes_left_in_block)
             ? bytes_remaining
             : bytes_left_in_block;

This is almost exactly the same structure as fs_read.

abs_offset
- The current byte position in the file.
block_offset
- Where inside the current block the write starts.
chunk
- How many bytes to write in this iteration.
- It is the smaller of:
  - bytes still needed overall
  - space remaining in this block

6. Ask helper for the correct disk block

int disk_block = get_block_for_offset(fd, abs_offset, 1);

The important part is the 1.

That means:

“If this offset needs a block that doesn’t exist yet, allocate one.”

So unlike fs_read, fs_write can grow the FAT chain.

7. Handle disk-full / partial-write case

if (disk_block < 0) {
    break;
}

If allocation fails, that usually means there are no free FAT blocks left.

Then we stop writing.

This is exactly what you want for partial writes:

if some bytes were already written, return that partial amount
if none were written, return 0

Since total_written starts at 0 and only increases after successful writes, breaking here naturally gives the right result.

Why this handles the edge case correctly

Suppose:

user wants 5000 bytes
only one free block remains

Then:

first iteration writes 4096 bytes successfully
next call to get_block_for_offset(..., 1) fails
loop breaks
function returns 4096

And if the disk is full before the very first byte:

first allocation fails immediately
total_written stays 0
function returns 0

Also exactly correct.

8. Decide whether to read old block contents or start with zeros

if (abs_offset < old_size) {
    if (block_read(disk_block, bounce) < 0) {
        return -1;
    }
} else {
    memset(bounce, 0, BLOCK_SIZE);
}

This is one of the trickiest parts.

Case A: writing into a block that already existed in the file

Then we must preserve the untouched bytes in that block.

So we:

block_read(disk_block, bounce);

Then modify only the needed bytes.

Case B: writing into a newly allocated block beyond old EOF

There is no previous file data to preserve there.

So we can safely do:

memset(bounce, 0, BLOCK_SIZE);

and then copy user data into the needed region.
This prevents garbage bytes from being written into the unwritten portions of a newly allocated block.

Why abs_offset < old_size is the right test

Because old_size is the file size before this write started.

If the current absolute offset is still inside that old file:

the block already existed
read old contents

If the current absolute offset is past or at old EOF:

this region did not exist before
zero-fill bounce buffer first

That is a clean way to distinguish overwrite vs growth.

9. Copy user bytes into bounce buffer

memcpy(bounce + block_offset, (uint8_t *)buf + total_written, chunk);

This writes the new bytes into the correct part of the block image.

So if the user writes 20 bytes starting at offset 100:

the first 100 bytes of the block stay as they were
bytes 100–119 are replaced
rest of the block stays as it was

That is exactly the read-modify-write behavior you need.

10. Write whole block back to disk

if (block_write(disk_block, bounce) < 0) {
    return -1;
}

After the bounce buffer has been updated, we write the entire block back.

11. Advance progress

total_written += chunk;

Now the next iteration continues at the next byte to write.

12. Update FD offset

filedes[fd].offset += total_written;

This moves the file cursor forward by the number of bytes actually written.

That matches the assignment’s state-management rules.

13. Update file size if the file grew

if (filedes[fd].offset > root[r_idx].size) {
    root[r_idx].size = filedes[fd].offset;
}

This is very important.

If the write extends past old EOF, the file size must grow.

If it was only an overwrite inside the file, the size stays the same.

So new file size is:

max(old size, new offset)

and that is exactly what this if does.

Example 1: overwrite inside existing file

Suppose:

file size = 1000
FD offset = 100
user writes 20 bytes

Then:

write stays inside existing file
no new blocks are allocated
size remains 1000
offset becomes 120

Example 2: append to end of file

Suppose:

file size = 1000
FD offset = 1000
user writes 50 bytes

Then:

write starts at EOF
maybe same last block, maybe a new block depending on position
file grows to 1050
offset becomes 1050

Example 3: multi-block write

Suppose:

offset = 4000
count = 300

Then:
Iteration 1

block offset = 4000
space left in block = 96
write 96 bytes

Iteration 2

next offset = 4096
block offset = 0
remaining = 204
write 204 bytes into next block

Done.

Example 4: disk full partial write

Suppose:

write needs 3 blocks
only 2 are available

Then:

first two blocks succeed
third allocation fails
loop breaks
function returns bytes written so far

Helper functions

`find_free_fat()` implementation

Correct implementation:

static int find_free_fat() {
    if (fat == NULL || sblk == NULL) {
        return -1;
    }

    for (int i = 0; i < sblk->numDataBlocks; i++) {
        if (fat[i] == get_fat_free()) {
            return i;  // Found a free block
        }
    }

    return -1;  // No free blocks → disk is full
}

1. Safety check

if (fat == NULL || sblk == NULL)

This ensures:

filesystem is mounted
FAT is loaded into memory

2. Loop through FAT entries

for (int i = 0; i < sblk->numDataBlocks; i++)

We only check valid data-region entries, not raw memory.

Important detail:
The FAT size in memory is:

sblk->numFAT * BLOCK_SIZE   // in bytes

BUT the number of valid FAT entries is:

sblk->numDataBlocks

Why? Because:

FAT entries correspond only to data blocks, not all disk blocks

So you MUST loop like this:

for (int i = 0; i < sblk->numDataBlocks; i++)

NOT over the full allocated memory.

Takeaway:

numFAT = how many blocks the FAT occupies on disk
numDataBlocks = how many entries exist in the FAT (what you actually use)
- The FAT is conceptually: uint16_t fat[numDataBlocks];
- Each entry corresponds to one data block.
- So:
  - fat[0] → first data block
  - fat[1] → second data block
  - ...
  - fat[numDataBlocks - 1]

3. Check for free entry

if (fat[i] == get_fat_free())

Your helper:

static inline uint16_t get_fat_free() {
    return 0x0000;
}

So this is equivalent to:

if (fat[i] == 0x0000)

4. Return index

return i;

This index is:

a FAT index, NOT a disk block

To convert it to a disk block later:

disk_block = sblk->dataIndex + i;

5. If nothing found

return -1;

Meaning:

Disk is full — no free blocks available

`get_block_for_offset` implementation

What `get_block_for_offset` does

Given:

an open file descriptor
a byte offset inside that file

it answers:

“Which absolute disk block contains that byte?”

To do that, it:
1. starts at the file’s indexFirstBlock
2. follows the FAT chain
3. advances one FAT link per 4096 bytes
4. converts the final FAT index into an absolute disk block: sblk->dataIndex + fat_index

Implementation

Correct implementation:

static int get_block_for_offset(int fd, size_t offset, int allocate) {
    if (!is_valid_fd(fd) || sblk == NULL) {
        return -1;
    }

    int r_idx = filedes[fd].rootIndex;
    uint16_t fat_idx = root[r_idx].indexFirstBlock;

    /* Which file block do we need?
       offset 0..4095   -> block 0 of file
       offset 4096..8191 -> block 1 of file
       etc.
    */
    size_t blocks_to_advance = offset / BLOCK_SIZE;

    /* Empty file / no first block yet */
    if (fat_idx == get_fat_eoc()) {
        if (!allocate) {
            return -1;
        }

        int new_idx = find_free_fat();
        if (new_idx < 0) {
            return -1;
        }

        root[r_idx].indexFirstBlock = (uint16_t)new_idx;
        fat[new_idx] = get_fat_eoc();
        fat_idx = (uint16_t)new_idx;
    }

    /* Walk forward through the FAT chain */
    for (size_t i = 0; i < blocks_to_advance; i++) {
        uint16_t next = fat[fat_idx];

        if (next == get_fat_eoc()) {
            if (!allocate) {
                return -1;
            }

            int new_idx = find_free_fat();
            if (new_idx < 0) {
                return -1;
            }

            fat[fat_idx] = (uint16_t)new_idx;
            fat[new_idx] = get_fat_eoc();
            fat_idx = (uint16_t)new_idx;
        } else {
            fat_idx = next;
        }
    }

    /* Convert FAT index -> absolute disk block index */
    return sblk->dataIndex + fat_idx;
}

`offset / BLOCK_SIZE`

This line is the key:

size_t blocks_to_advance = offset / BLOCK_SIZE;

Examples:

offset = 0 → 0 / 4096 = 0 → still in first file block
offset = 3000 → 0 → still in first file block
offset = 4096 → 1 → second file block
offset = 9000 → 2 → third file block

So this tells us how many FAT links to follow.

Why start from `indexFirstBlock`

The root directory stores:

root[r_idx].indexFirstBlock

That is the FAT index of the file’s first data block, or 0xFFFF if the file is empty.

So the FAT chain always begins there.

Why return `sblk->dataIndex + fat_idx`

This is one of the most important rules in the assignment:

FAT indices are not absolute disk block numbers
FAT index 0 means “the first block in the data region”
to read the real disk block, you must offset by dataIndex

Why the allocate parameter exists

For fs_read, you will call:

get_block_for_offset(fd, offset, 0)

because reading should never allocate blocks.

For fs_write, you will later call it with allocate = 1, so if the file grows past its current chain, the helper can create new FAT blocks.

So even though allocate is not really used by fs_read, implementing it now will help you later with fs_write.

Navigating the linked list

fat[fat_idx] = (uint16_t)new_idx;
fat[new_idx] = get_fat_eoc();
fat_idx = (uint16_t)new_idx;

We are extending a linked list.

Remember:

The FAT is a linked list structure
Each entry points to the next block
0xFFFF = end of chain (EOC)

So a file looks like:

Block A → Block B → Block C → 0xFFFF

When do we execute this code?
This happens when:

next == get_fat_eoc()

Meaning:

“We reached the end of the file’s current block chain”
But we still need to go further (because offset requires more blocks), so:
we must allocate a new block and link it

Line-by-line explanation

Extend the chain

fat[fat_idx] = (uint16_t)new_idx;

Before: fat[fat_idx] = 0xFFFF (end of chain)
After: fat[fat_idx] = new_idx
So we change: ... → fat_idx → 0xFFFF into ... → fat_idx → new_idx
This extends the chain

Mark the new block as end of chain

fat[new_idx] = get_fat_eoc();

fat[new_idx] = get_fat_eoc();
Now the new block becomes the last block: new_idx → 0xFFFF
So the chain is now: ... → fat_idx → new_idx → 0xFFFF
This keeps the chain valid

Move forward in the chain

fat_idx = (uint16_t)new_idx;

This updates your traversal pointer so you continue walking the chain.
Without this, you would still be pointing at the previous block.

Visual example

Before allocation
Suppose the FAT looks like:

fat[4] = 7
fat[7] = 0xFFFF

File chain:

4 → 7 → EOC

You need one more block
Let’s say:

new_idx = 10

After these three lines

fat[7] = 10;
fat[10] = 0xFFFF;
fat_idx = 10;

Now FAT:

fat[4] = 7
fat[7] = 10
fat[10] = 0xFFFF

Chain becomes:

4 → 7 → 10 → EOC

Instructions

Introduction & Learning Objectives

The Virtual Disk Architecture

The Core Data Structures

The Superblock (Block 0)

The File Allocation Table (FAT)

The Root Directory

The File Descriptor Table

FAT Chain Example

Codebase Organization

What You Must Implement (The API)

1: Mounting and Unmounting

2: Directory Operations

3: File Descriptors

4: Reading and Writing (The Bounce Buffer)

Edge Cases & Expected Error Handling

1. Invalid File Descriptors

2. File Creation Limits (fs_create)

3. Open File Limits (fs_open)

4. Reading Past EOF (fs_read)

5. Out of Space / Disk Full (fs_write)

How to Compile and Run Your Code

1. Compiling the Project

2. Formatting a Virtual Disk

3. Interactive Testing (Command-Line Interface)

4. Running the C-Level Tests

5. The Autograder

6. Understanding the Provided Tests

Key Systems Concepts to Understand

1. Struct Padding and __attribute__((__packed__))

2. The "Bounce Buffer" (Block I/O vs. Byte I/O)

3. Pointer Arithmetic vs. Byte Math

4. File Descriptors and "State"

Implementation

1. Mounting and Unmounting

What each function should do

Correct implementation

fs_mount questions

1. Why do we use sblk != NULL to check if mounted?

2. Why do we malloc(sizeof(Superblock))?

3. Why use memcmp(sblk->sig, "CSCE-313", 8)?

4. Why is FAT allocation sblk->numFAT * BLOCK_SIZE?

5. Why cast to (uint8_t *)fat when reading FAT blocks?

6. Why do we read root directly into root?

7. Why initialize all file descriptors during mount?

fs_umount questions

8. Why check numFilesOpen > 0 in fs_umount?

9. Why write metadata back in fs_umount?

fs_info questions

10. Why count FAT free entries only up to numDataBlocks?

11. Count free FAT entries

12. Count fee root entries

2. Directory Operations

fs_create implementation

1. Mounted check

2. Empty filename check

3. Length check

4. One scan handles both duplicate detection and free-slot detection

5. Why root[i].name[0] == '\0' means “empty slot”

6. Why memset first is a good idea

7. Why indexFirstBlock = get_fat_eoc()

Why we do not call find_free_fat() here

Common mistake

fs_delete implement

Big picture first

One very important edge case

Implementation

1. Validate basic input

2. Find the file in the root directory

3. If the file does not exist, fail

4. Check whether the file is open

5. Start FAT traversal

6. Walk the FAT chain and free each block

Example traversal

7. Clear the root entry

fs_ls implementation

What fs_ls should do

Implementation

1. Check whether the filesystem is mounted

2. Print a header

2. File Creation Limits (`fs_create`)

3. Open File Limits (`fs_open`)

4. Reading Past EOF (`fs_read`)

5. Out of Space / Disk Full (`fs_write`)

1. Struct Padding and `attribute((packed))`

`fs_mount` questions

1. Why do we use `sblk != NULL` to check if mounted?

2. Why do we `malloc(sizeof(Superblock))`?

3. Why use `memcmp(sblk->sig, "CSCE-313", 8)`?

4. Why is FAT allocation `sblk->numFAT * BLOCK_SIZE`?

5. Why cast to `(uint8_t *)fat` when reading FAT blocks?

`fs_umount` questions

8. Why check `numFilesOpen > 0 in fs_umount`?

9. Why write metadata back in `fs_umount`?

`fs_info` questions

`fs_create` implementation

5. Why `root[i].name[0] == '\0'` means “empty slot”

6. Why `memset` first is a good idea

7. Why `indexFirstBlock = get_fat_eoc()`

Why we do not call `find_free_fat()` here

`fs_delete` implement

`fs_ls` implementation

What `fs_ls` should do

`fs_open()` implementation

What `fs_open` must do

Why return `i` instead of `filedes[i].id`?

`fs_close` implementation

What `fs_close` must do

`fs_stat` implementation

What `fs_stat` should do

`fs_lseek` implementation

What `fs_lseek` must do

`fs_read` implementation

`fs_read` implementation

What `fs_write` must do

The key difference from `fs_read`