08 - UNIX I-O file descriptors, redirection and standard I-O

#files #filesystem #Linux

Class: CSCE-313

Notes:

Outline:

File Descriptors
Redirection
Standard I/O (stdio)

File Descriptors

Why File Descriptors?

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-20.png449

Why do we need to open and close sessions when we read-from or write-to files?

Because the system can internally store (cache) information about the file as we access it.

Where is file stored on which storage device?
Where are we as we sequentially traverse a file?
Which parts of the file are cached? And where?
Who else is accessing the file right now?
etc.

File Descriptors practice - I

int main() {
    char c1, c2, c3;
    char *fname = "file.txt";
    
    int fd1 = open(fname, O_RDONLY);
    int fd2 = open(fname, O_RDONLY);
    int fd3 = open(fname, O_RDONLY);
    
    dup2(fd2, fd3);
    read(fd1, &c1, 1);
    read(fd2, &c2, 1);
    read(fd3, &c3, 1);
    printf("c1 = %c, c2 = %c, c3 = %c\n", c1, c2, c3);
}

What would this program print for a file containing “abcde”?

Try online: https://csce313.schnekli.top/gform/fd-1

Answer: c1 = a, c2 = a, c3 = b

fd1 reads 1 byte, but since fd2 is still pointing to the beginning of the file it will also read 'a'
File pointer advanced when you read 1 byte from fd2, and fd3 is able to see it since fd2 and fd3 are basically the same, therefore fd3 reads 'b'.

File Descriptors practice - II

int main(int argc, char *argv[]) {
    int fd1, fd2, fd3;
    char *fname = argv[1];
    fd1 = open(fname, O_CREAT|O_TRUNC|O_RDWR, S_IRUSR|S_IWUSR);
    write(fd1, "pqrs", 4);
    fd2 = open(fname, O_APPEND|O_WRONLY, 0);
    write(fd2, "jklmn", 5);
    fd3 = dup(fd1); /* Allocates descriptor */
    write(fd3, "wxyz", 4);
    write(fd2, "ef", 2);
    // set fd1 to point to the beginning of the file
    lseek (fd1, 0, SEEK_SET);
    write (fd1, "ab", 2);
    return 0;
}

Note:
- S_IRUSR has R for reading
- S_IWUSR has W for writing
fd1
- You start with an empty file
- If the file already existed, then it is truncated to 0
- fd1 ends up writing pqrs into the file
fd2
- First seeks the end of the file before doing the write
- This time we give the O_APPEND option, so what previous writes are still there (write to the end of the file)
- At the end of that you get pqrsjklmn
fd3
- fd1 is pointing to the end of pqrs
- fd2 is pointing to the end of pqrsjkml
- Since fd3 is basically the same as fd1 then fd3 write to get: pqrswxyzn
fd2
- Write at its offfset to get pqrswxyznef
fd1
- Finally fd1 sets the offset to 0 and adds ab to get: abpqrswxyznef
- lseek is basically setting fd1 to the 0 offset.
Note: fd1 and fd3 do not do deletions, they just do overwrites at whatever the offset is (wherever they left in).
Note how we are Bit-ORing the arguments for Open (each of them is independent) -> it is a common way of passing multiple arguments into a single argument place.

Answer: abrswxyznef

File Descriptors and fork()

With fork(), child inherits content of parent’s address space, including its file descriptor table.

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-21.png252

File Descriptors and fork() - I

int main(void) {
    char c;
    int myfd=open("myf.txt",O_RDONLY);
    fork();
    read(myfd, &c, 1);
    printf("Got %c\n", c);
}

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-2.png250

File Descriptors and fork() - II

int main(void) {
    char c;
    fork();
    int myfd=open("myf.txt",O_RDONLY);
    read(myfd, &c, 1);
    printf("Got %c\n", c);
}

What would this program print for a file containing "abcde"?

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-1.png250

Answer: aa

The trick is that the opening is happening after the fork(), after the fork now you have two independent processes calling open.
You will read the first character of the file two times!
If you had forked after the open, then the output would be ab. Though we do not know which processes reads a or b.
Remember reads and writes are atomic
- "Operations will happen indivisibly"
- The kernel will guarantee atomicity

Example of duplicated FD's

Suppose the disk file foobar.txt consists of the six ASCII characters “foobar”. What is the output of the following program?

int main () {
	char c;
	int fd = open("foobar.txt", O_RDONLY);
	
	if (fork() == 0) {
		read(fd, &c, 1);
		return 0;
	} else {
		wait(0);
		read(fd, &c, 1);
		cout << "c=" << c << endl;
	}
}

Answer: The child inherits the parent’s descriptor table and both processes share the same file table. Thus the descriptor fd in both the parent and child points to the same open file table entry.

When the child reads the first byte of the file, the file position increments by 1. Thus the parent reads the second byte and output is “c=o”

Example - Descriptor Recycling

What is the output of the following program?

int main (){
    int fd1 = open("foo.txt",O_RDONLY);
    close(fd1);
    int fd2 = open("baz.txt", O_RDONLY);
    cout << "fd2 = " << fd2 << endl;
}

Unix processes begin life with open descriptors assigned to stdin ( $fd = 0$ ), stdout ( $fd = 1$ ), and stderr ( $fd = 2$ ).

The open function always returns the lowest unopened descriptor so the output will be "fd2 = 3"

Notes:

Unix semantics guarantees that you will get an available file descriptor

Redirection

Standard I/O Functions

The C standard library (libc.so) contains a collection of higher-level standard I/O functions

Documented in Appendix B of K&R.

Examples of standard I/O functions:

Opening and closing files (fopen and fclose)
Reading and writing bytes (fread and fwrite)
Reading and writing text lines (fgets and fputs)
Formatted reading and writing (fscanf and fprintf)

Notes:

The big advantage of using standard IO is that it is buffered
Little writes are accumulated
It is a lot more efficient than invoking system calls directly
fprintf -> you can give a string in which you can put some formatting characters for convenience

Standard I/O models open files as streams

Abstraction for a file descriptor and a buffer in memory.

C programs begin life with three open streams (defined in stdio.h)

stdin (standard input)
stdout (standard output)
stderr (standard error)

#include <stdio.h>
extern FILE *stdin; /* standard input (descriptor 0) */
extern FILE *stdout; /* standard output (descriptor 1) */
extern FILE *stderr; /* standard error (descriptor 2) */
int main() {
    fprintf(stdout, "Hello, world\n");
}

Motivations

Portability: Code is compiled to Windows-native or Linux-native code, not tied to a particular architecture.
Buffering: For efficiency, considering device type
- fprintf to stdout is flushed on '\n' character (but not to a file)
- fprintf to disk files is flushed after reaching file buffer size

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-3.png500

Notes:

At some point it says "ok that's enough characters, lets flush!"
It eventually has to flush (will never buffer forever)
fflush will flush the buffer (accumulated data) for you and will actually write to the output direction
Your standard library will by default flush on \n characters
- Think that you need to be able to flush whenever you want to show the user what to input in the terminal
If you are using printf then your write to the console is buffered (is saved in a buffer), if you do writes later that will also append to the buffer. When you do flush then is there when these written characters get displayed in the output device.
Generally buffer IO will be more efficient.
- There is no effect to functionality of a program, it is a closed process (buffering)
- It is originally implemented in the C compiler
Standard I/O is generally what you have been using in all of your writes!

Buffering in Standard I/O

Standard I/O functions use buffered I/O
00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-4.png400

Buffer flushed to output fd on "\n" or fflush() call

You are free to flush on your own (using ffflush) whenever you want!
Buffer size is optimized for your filesystem (i.e. doing 4K units since your page size is 4K)

Unix I/O vs. Standard I/O

Standard I/O is implemented using low-level Unix I/O

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-5.png400
Which ones should you use in your programs?

Pros and Cons of Standard I/O

Pros of Unix I/O over standard I/O

It is the most general and lowest overhead form of I/O
- All other I/O packages are implemented using Unix I/O functions on a Unix system
It provides functions for accessing file metadata, for e.g., with stat()

Cons of Unix I/O over standard I/O

Efficient data access requires some form of buffering, which can be
- Device dependent (e.g., a whole track of a disk can be read at once)
- Tricky and error prone
Both these issues are addressed by standard I/O packages

Notes:

The one big thing about the standard I/O is that it is required accross different implementations

Choosing I/O Functions

General rule: use the highest-level I/O functions you can

Many C programmers are able to do all of their work using the standard I/O functions

Use standard I/O

When working with disk or terminal files

Use raw Unix I/O

Inside signal handlers, because Unix I/O is async-signal-safe
In rare cases when you need absolute highest performance
You do your own buffering depending on the application nature and knowledge about the underlying hardware
On a 24 hard-drive (i.e., 48 TB cap) RAID system, we needed to exploit windows-native I/O to get nearly $3 GB / s$ read/write speed
Standard I/O would only give us $< 1 GBps$ read/write speed