10 - File System Organization

#filesystem #files

Class: CSCE-313

Notes:

Outline

File System Abstraction
File System Organization
UNIX File API
Hard links vs. soft links

The file system abstraction

Named hierarchical data: Human-readable names (e.g., filenames) given to data
Reliability: Unexpected power-cycles should not corrupt data
E.g., to create a new file, you need to do operations atomically
- Allocate an inode for the file
- Allocate data blocks to the file
- Write the block ids into the file inode
- Read data for directory, modify, write back data for dir
Controlled sharing: Determine who can read/write/execute certain files sequentially/simultaneously w/o corrupting

Notes:

It gives you main hierarchical data
All that you have on a disk are tracks and sectors
In some sense you have a linear sequence of sectors
- Think of it as counting from the inside to the outside
Somehow your filesystem is going to construct a hierarchical namespace out of this linear structure
- This gives you abstraction to be able to organized your files well
- But under the hood it is just a linear structure mapping

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-7.png465

On disk, the green blobs represent inodes
The red and blue blobs represent data blocks for the two inodes corresponding to a & b.

Example:

Imagine that you are trying to create a new file:
- consider what you filesystem needs to do
- The entire metadata for a file goes into an inode
- The entire sequence of bytes will go to that inode
- Once an inode is allocated, it needs to mark an inode as unavailable
- It has to go search for and find free data blocks to assign to that inode, and then it needs to write the entry for that file into a directory
  - An entry is nothing more than inode-name pairs
  - It better not fail anything in between, it better do this atomically, because otherwise you will need to use fseeking to recover your filesystem which takes a lot of time
Your filesystem not only is building this hierarchical abstraction, but it is doing it very reliable and giving you the illusion of all or nothing meaning either this things all happen or none of it happen.
Somewhere in the file metadata is our permission bits that allow someone to read/write, etc.
Somehow if you try to reach a particular file, you need to be able to go from the directories on top of it (from / to foo and then from foo to a)
Your filesystem looks up that inode, and in that inode tells you the exact sequence of disk blocks corresponding to that file, and now you can read the exact blocks in order to read the file.
- For you it is just a byte offset within the file, but underneath, the system is reading blocks and returning bytes to you

An example data layout on disk

Consider the disk to be a linear sequence of blocks.

Say that block size is 4 KB . Gives us a 256 KB disk.
The blocks are addressed from $[0 \dots n)$ .

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-8.png500

Notes:

For the point of view of the filesystem, think of this space as a linear array
Enture filesystem is 256 KB
Filesystem will layout metadata in different places which will let it build a hierarchical space
This is actually very close to how the original UNIX filesystem was layed out

Data region in a file system

Reserve data region to store user data
File system has to track which data blocks belong to a file, and which blocks are free.
Track metadata for a file such as its size, mode bits, owner, etc. (inode)

How do we store these inodes in the file system?

Notes:

The number of inodes that your filesystem allocates at creation time is the maximum number of inodes
The filesystem has figured out somehow that it is going to take 58 blocks and allocate data for files and other objects to be layed out on those 58 blocks, and then keep 8 blocks for metadata
Now your filesystem needs to split this up into metadata (inodes) and other things.

Inode table in a file system

Reserve some space for an inode table (array)
- This holds an array of on-disk inodes.
- Ex) inode tables: blocks 3-7, inode size: 256 bytes
  - A 4-KB block can hold 16 inodes.
  - This file system contains $16 \times 5 = 80$ inodes. (maximum number of files)

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-10.png500

Notes:

If you have a lot of inodes and you have alrge video files etc. then you are wasting some blocks
If you have too few blocks then...
The filesystem, since it knows what is going to layout on disk, it nodes that the size of an inode is of certain byte size
The largest number of inodes is 80

Allocation Structures

To track whether inodes or data blocks are free or allocated.
Use a bitmap, each bit indicates free(0) or in-use(1)
- data bitmap: for data region (per block)
- inode bitmap: for inode table (per inode)

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-11.png500

Notes:

Sometimes sit is easier to allocate at block boundaries than distributing among all blocks
You could built a link list of free data blocks that have just one pointer that points to the first free data block and so on
- The only problem with this is that if you wanted to grow a file, (add a 6th block to the file) -> spatial locality is important, you want the 6th block to be close to the 5th one so that your disk head does not have to move too much
Now that you have a bitmap, you can find adjacent blocks easily
In much the same way you need an inode bitmap that tells you which inodes are currently free and which ones are not

Super Block

The Super block contains metadata about this file system
- Ex) The number of inodes, beginning location of inode table etc.

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-12.png500

Thus, when mounting a file system, the OS/FS reads the superblock first to initialize various information.

Notes:

Somehow you are going to get to the inode of the root directory, once you have the inode of the root directory, now you can traverse the subtree
- Now you have attached it to some particular place in your name space

File Organization: The inode

Each inode is referred to by an inode number.
- From an inode #, the FS can compute the location of inode on disk.
- Ex) inode number: 32
  - Calculate the offset into the inode region ( $32 \times$ sizeof(inode) (256 bytes) $= 8192$
  - Add start address of the inode table $(12 KB) +$ inode region $(8 KB) = 20 KB$

Notes:

How to find a particular inode?
- You basically integer divide the inode number by 16, this gives you the block in which the inode resides
- If you get the remainder of this division, you will get the offset of the element

File System Layout

How do you build a hierarchy (tree) of files from the flat model of data blocks?

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-14.png449

Notes:

Your superblock is pointing to this first directory in some sense that corresponds to this filesystem
This is the root directory, essentially a file with a special structure, that structure is basically associating names with inode numbers
- It has info about inode names and numbers
In this example
- a has two names and inode numbers:
  - b -> object
  - c -> directory
    - c is just an enumeration of names and inode numbers
      - d -> object
      - e -> object
To us it looks like a long array of inodes
In order to find the data for e you need to get to c, in order to get to c, you need to look at a for its inode. This is how you get a hierarchy
- e does not exist out of c, and c does nto exist outside of a
- You start at /, this is the root inode, which corresponds to a directory
- Every object must refer to an inode because the data for that object can only be located through the inode
In fact the name of the object is not stored within an object, the name of a file is found in these special files called parent directories

Characteristics of a magnetic disk

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-15.png331x256

Why are magnetic disks slow?

Platters rotate at a constant speed.
Each surface has data in the form of several tracks.
Each track is separated into sectors/blocks.

How a block is read from disk

The arm moves the disk head to the right track. This is a "seek". Extremely slow $(≅ 10 ms)$ .
Then, the head waits for the correct sector to come below. This is relatively faster because of constant motion (<<10 ms), determined by disk RPM.
Read the sector and send to the CPU. Much faster (SATA rate: $\sim 500 MB / s$ ).

Notes:

If you want to read something off a particular sector, then the head first needs to reach that track. You will spend 10ms on this.
The blocks of a file needs to be layed out in such a way that you are able to quickly reach each block without watining too much for the disk to move
- Look at The Memory Hierarchy for disk r/w time calculations
It is just slow to move a mechanical thing
Once you get to a position in the disk, then you can read very rapidly

Building a hierarchy

Files: "contain" metadata and actual data
Directories: Special files that contain names and "pointers" to files

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-16.png327

In some sence you are force to go from a known thing to the target file

Filesystems terms

path: string identifying a file (e.g., /home/davidkebo/labs/main.cpp)
- Think of it as having a "/"
root directory: top-most directory in the filesystem, denoted by "/"
- Unlike Windows
absolute path: path starting from the root "/"
relative path: a path relative to the current working directory. Does not start with a "/"
- The kernel keeps track of your current working directory
index node (inode): structure that stores the file/directory attributes such as the object's metadata and the data block locations on disk.
- Read it using "stat"
- The entire sequence of blocks to read that file is somehow packed into that inode
hard link: a directory entry that refers to the inode of a file.
- Basically makes a new name for the same inode
soft/symbolic link: a shortcut that names a path to the file
- Really just a path name embedded in this special object called symlink
- Does not have to be a ptah (it can be any string), but during resolution, it starts resolving using the path that the symlink corresponds

hard link vs. soft link

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-17.png348

Notes:

A hard link is associating a name with an inode number
- When you create a hard link this is all you are doing
- Since there is a reference counting in the inode (how many things are counting at me)
- When you remove the file, you are just decrementing this reference count by 1
- It is when this reference count reaches 0, the inode cease to exist
A symlink is basically just a path
- Lets say you have a symlink to some path, and then you remove the file the path points to
- At this point if this path does not exist, your symlink resolution will fail
- It is storing a path, not a reference to an inode
- It is a special file whose contents are some string
- When you resolve the symlink is when the resolution process starts lookin at its path
- If that path name is too long, then the filesystem will actually allocate disk blocks to that inode and store it

Hard link example:

> ln foo.md bar.md
> ls -li foo.md bar.md
69312997 -rw-r--r--  2 macc  staff  0 Mar 18 14:37 bar.md
69312997 -rw-r--r--  2 macc  staff  0 Mar 18 14:37 foo.md

Note the two arguments to give is the existing name and the new name you want to give
They will both point to the same filesystem object
Note this is very different than doing a deep/shallow copy of an object, here we are talking about the same inode!

Soft link example:

> ln -s foo.md fib.md
> ls -li foo.md fib.md
69313223 lrwxr-xr-x  1 macc  staff  6 Mar 18 14:38 fib.md -> foo.md
69312997 -rw-r--r--  2 macc  staff  0 Mar 18 14:37 foo.md

Hard Links

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-22.png400

shell command

ln /dirA/name1 /dirB/name2

is typically implemented using the link system call:

#include <stdio.h>
#include <unistd.h>

if (link(“/dirA/name1”, “/dirB/name2”) == -1)
perror(“failed to make new link in /dirB”);

Cannot hard link to a directory.
Cannot hard link to a sym link.
- The symlink is automatically traversed

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-23.png429

inode #
links

Hard Links: `refcounts`

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-24.png400

#include <stdio.h>
#include <unistd.h>

if (unlink(“/dirA/name1”) == -1)
perror(“failed to delete link in /dirA”);

if (unlink(“/dirB/name2”) == -1)
perror(“failed to delete link in /dirB”);