10 - File System Organization
Class: CSCE-313
Notes:
Outline
- File System Abstraction
- File System Organization
- UNIX File API
- Hard links vs. soft links
The file system abstraction
- Named hierarchical data: Human-readable names (e.g., filenames) given to data
- Reliability: Unexpected power-cycles should not corrupt data
- E.g., to create a new file, you need to do operations atomically
- Allocate an inode for the file
- Allocate data blocks to the file
- Write the block ids into the file inode
- Read data for directory, modify, write back data for dir
- Controlled sharing: Determine who can read/write/execute certain files sequentially/simultaneously w/o corrupting
Notes:
- It gives you main hierarchical data
- All that you have on a disk are tracks and sectors
- In some sense you have a linear sequence of sectors
- Think of it as counting from the inside to the outside
- Somehow your filesystem is going to construct a hierarchical namespace out of this linear structure
- This gives you abstraction to be able to organized your files well
- But under the hood it is just a linear structure mapping
/CSCE-313/Lecture/Visual%20Aids/image-7.png)
- On disk, the green blobs represent inodes
- The red and blue blobs represent data blocks for the two inodes corresponding to a & b.
Example:
- Imagine that you are trying to create a new file:
- consider what you filesystem needs to do
- The entire metadata for a file goes into an inode
- The entire sequence of bytes will go to that inode
- Once an inode is allocated, it needs to mark an inode as unavailable
- It has to go search for and find free data blocks to assign to that inode, and then it needs to write the entry for that file into a directory
- An entry is nothing more than inode-name pairs
- It better not fail anything in between, it better do this atomically, because otherwise you will need to use
fseekingto recover your filesystem which takes a lot of time
- Your filesystem not only is building this hierarchical abstraction, but it is doing it very reliable and giving you the illusion of all or nothing meaning either this things all happen or none of it happen.
- Somewhere in the file metadata is our permission bits that allow someone to read/write, etc.
- Somehow if you try to reach a particular file, you need to be able to go from the directories on top of it (from
/tofooand then fromfootoa) - Your filesystem looks up that inode, and in that inode tells you the exact sequence of disk blocks corresponding to that file, and now you can read the exact blocks in order to read the file.
- For you it is just a byte offset within the file, but underneath, the system is reading blocks and returning bytes to you
An example data layout on disk
Consider the disk to be a linear sequence of blocks.
- Say that block size is 4 KB . Gives us a 256 KB disk.
- The blocks are addressed from
.
/CSCE-313/Lecture/Visual%20Aids/image-8.png)
Notes:
- For the point of view of the filesystem, think of this space as a linear array
- Enture filesystem is 256 KB
- Filesystem will layout metadata in different places which will let it build a hierarchical space
- This is actually very close to how the original UNIX filesystem was layed out
Data region in a file system
-
Reserve data region to store user data
/CSCE-313/Lecture/Visual%20Aids/image-9.png)
-
File system has to track which data blocks belong to a file, and which blocks are free.
-
Track metadata for a file such as its size, mode bits, owner, etc. (inode)
How do we store these inodes in the file system?
Notes:
- The number of inodes that your filesystem allocates at creation time is the maximum number of inodes
- The filesystem has figured out somehow that it is going to take 58 blocks and allocate data for files and other objects to be layed out on those 58 blocks, and then keep 8 blocks for metadata
- Now your filesystem needs to split this up into metadata (inodes) and other things.
Inode table in a file system
- Reserve some space for an inode table (array)
- This holds an array of on-disk inodes.
- Ex) inode tables: blocks 3-7, inode size: 256 bytes
- A 4-KB block can hold 16 inodes.
- This file system contains
inodes. (maximum number of files)
/CSCE-313/Lecture/Visual%20Aids/image-10.png)
Notes:
- If you have a lot of inodes and you have alrge video files etc. then you are wasting some blocks
- If you have too few blocks then...
- The filesystem, since it knows what is going to layout on disk, it nodes that the size of an inode is of certain byte size
- The largest number of inodes is 80
Allocation Structures
- To track whether inodes or data blocks are free or allocated.
- Use a bitmap, each bit indicates free(0) or in-use(1)
- data bitmap: for data region (per block)
- inode bitmap: for inode table (per inode)
/CSCE-313/Lecture/Visual%20Aids/image-11.png)
Notes:
- Sometimes sit is easier to allocate at block boundaries than distributing among all blocks
- You could built a link list of free data blocks that have just one pointer that points to the first free data block and so on
- The only problem with this is that if you wanted to grow a file, (add a 6th block to the file) -> spatial locality is important, you want the 6th block to be close to the 5th one so that your disk head does not have to move too much
- Now that you have a bitmap, you can find adjacent blocks easily
- In much the same way you need an inode bitmap that tells you which inodes are currently free and which ones are not
Super Block
- The Super block contains metadata about this file system
- Ex) The number of inodes, beginning location of inode table etc.
/CSCE-313/Lecture/Visual%20Aids/image-12.png)
- Thus, when mounting a file system, the OS/FS reads the superblock first to initialize various information.
Notes:
- Somehow you are going to get to the inode of the root directory, once you have the inode of the root directory, now you can traverse the subtree
- Now you have attached it to some particular place in your name space
File Organization: The inode
- Each inode is referred to by an inode number.
- From an inode #, the FS can compute the location of inode on disk.
- Ex) inode number: 32
- Calculate the offset into the inode region (
sizeof(inode) (256 bytes) - Add start address of the inode table
inode region
- Calculate the offset into the inode region (
/CSCE-313/Lecture/Visual%20Aids/2026-03-18_14-09-24.png)
Notes:
- How to find a particular inode?
- You basically integer divide the inode number by 16, this gives you the block in which the inode resides
- If you get the remainder of this division, you will get the offset of the element
File System Layout
How do you build a hierarchy (tree) of files from the flat model of data blocks?
/CSCE-313/Lecture/Visual%20Aids/image-14.png)
Notes:
- Your superblock is pointing to this first directory in some sense that corresponds to this filesystem
- This is the root directory, essentially a file with a special structure, that structure is basically associating names with inode numbers
- It has info about inode names and numbers
- In this example
- a has two names and inode numbers:
- b -> object
- c -> directory
- c is just an enumeration of names and inode numbers
- d -> object
- e -> object
- c is just an enumeration of names and inode numbers
- a has two names and inode numbers:
- To us it looks like a long array of inodes
- In order to find the data for e you need to get to c, in order to get to c, you need to look at a for its inode. This is how you get a hierarchy
- e does not exist out of c, and c does nto exist outside of a
- You start at
/, this is the root inode, which corresponds to a directory - Every object must refer to an inode because the data for that object can only be located through the inode
- In fact the name of the object is not stored within an object, the name of a file is found in these special files called parent directories
Characteristics of a magnetic disk
/CSCE-313/Lecture/Visual%20Aids/image-15.png)
Why are magnetic disks slow?
- Platters rotate at a constant speed.
- Each surface has data in the form of several tracks.
- Each track is separated into sectors/blocks.
How a block is read from disk
- The arm moves the disk head to the right track. This is a "seek". Extremely slow
. - Then, the head waits for the correct sector to come below. This is relatively faster because of constant motion (<<10 ms), determined by disk RPM.
- Read the sector and send to the CPU. Much faster (SATA rate:
).
Notes:
- If you want to read something off a particular sector, then the head first needs to reach that track. You will spend 10ms on this.
- The blocks of a file needs to be layed out in such a way that you are able to quickly reach each block without watining too much for the disk to move
- Look at The Memory Hierarchy for disk r/w time calculations
- It is just slow to move a mechanical thing
- Once you get to a position in the disk, then you can read very rapidly
Building a hierarchy
- Files: "contain" metadata and actual data
- Directories: Special files that contain names and "pointers" to files
/CSCE-313/Lecture/Visual%20Aids/image-16.png)
- In some sence you are force to go from a known thing to the target file
Filesystems terms
- path: string identifying a file (e.g.,
/home/davidkebo/labs/main.cpp)- Think of it as having a "
/"
- Think of it as having a "
- root directory: top-most directory in the filesystem, denoted by "
/"- Unlike Windows
- absolute path: path starting from the root "
/" - relative path: a path relative to the current working directory. Does not start with a "
/"- The kernel keeps track of your current working directory
- index node (inode): structure that stores the file/directory attributes such as the object's metadata and the data block locations on disk.
- Read it using "stat"
- The entire sequence of blocks to read that file is somehow packed into that inode
- hard link: a directory entry that refers to the inode of a file.
- Basically makes a new name for the same inode
- soft/symbolic link: a shortcut that names a path to the file
- Really just a path name embedded in this special object called symlink
- Does not have to be a ptah (it can be any string), but during resolution, it starts resolving using the path that the symlink corresponds
hard link vs. soft link
/CSCE-313/Lecture/Visual%20Aids/image-17.png)
Notes:
- A hard link is associating a name with an inode number
- When you create a hard link this is all you are doing
- Since there is a reference counting in the inode (how many things are counting at me)
- When you remove the file, you are just decrementing this reference count by 1
- It is when this reference count reaches 0, the inode cease to exist
- A symlink is basically just a path
- Lets say you have a symlink to some path, and then you remove the file the path points to
- At this point if this path does not exist, your symlink resolution will fail
- It is storing a path, not a reference to an inode
- It is a special file whose contents are some string
- When you resolve the symlink is when the resolution process starts lookin at its path
- If that path name is too long, then the filesystem will actually allocate disk blocks to that inode and store it
Hard link example:
> ln foo.md bar.md
> ls -li foo.md bar.md
69312997 -rw-r--r-- 2 macc staff 0 Mar 18 14:37 bar.md
69312997 -rw-r--r-- 2 macc staff 0 Mar 18 14:37 foo.md
- Note the two arguments to give is the existing name and the new name you want to give
- They will both point to the same filesystem object
- Note this is very different than doing a deep/shallow copy of an object, here we are talking about the same inode!
Soft link example:
> ln -s foo.md fib.md
> ls -li foo.md fib.md
69313223 lrwxr-xr-x 1 macc staff 6 Mar 18 14:38 fib.md -> foo.md
69312997 -rw-r--r-- 2 macc staff 0 Mar 18 14:37 foo.md
Links
Hard link
- Directory entry contains the inode number
- Creates another name (path) for the file
- Each is "first class"
Soft link or Symbolic link
- Directory entry contains the inode number
- The data "logically" contained in the file corresponding to this inode is a pathname
- Either the pathname fits entirely in the inode, or
- It's in the data block(s) for the inode
Hard Links
/CSCE-313/Lecture/Visual%20Aids/image-22.png)
shell command
ln /dirA/name1 /dirB/name2
is typically implemented using the link system call:
#include <stdio.h>
#include <unistd.h>
if (link(“/dirA/name1”, “/dirB/name2”) == -1)
perror(“failed to make new link in /dirB”);
- Cannot hard link to a directory.
- Cannot hard link to a sym link.
- The symlink is automatically traversed
/CSCE-313/Lecture/Visual%20Aids/image-23.png)
- inode #
-
links
Hard Links: refcounts
/CSCE-313/Lecture/Visual%20Aids/image-24.png)
#include <stdio.h>
#include <unistd.h>
if (unlink(“/dirA/name1”) == -1)
perror(“failed to delete link in /dirA”);
if (unlink(“/dirB/name2”) == -1)
perror(“failed to delete link in /dirB”);
File System Organization
/CSCE-313/Lecture/Visual%20Aids/image-25.png)