08 - UNIX I-O file descriptors, redirection and standard I-O
Class: CSCE-313
Notes:
Outline:
- File Descriptors
- Redirection
- Standard I/O (stdio)
File Descriptors
Why File Descriptors?
/CSCE-313/Lecture/Visual%20Aids/image-20.png)
Why do we need to open and close sessions when we read-from or write-to files?
Because the system can internally store (cache) information about the file as we access it.
- Where is file stored on which storage device?
- Where are we as we sequentially traverse a file?
- Which parts of the file are cached? And where?
- Who else is accessing the file right now?
- etc.
File Descriptors practice - I
int main() {
char c1, c2, c3;
char *fname = "file.txt";
int fd1 = open(fname, O_RDONLY);
int fd2 = open(fname, O_RDONLY);
int fd3 = open(fname, O_RDONLY);
dup2(fd2, fd3);
read(fd1, &c1, 1);
read(fd2, &c2, 1);
read(fd3, &c3, 1);
printf("c1 = %c, c2 = %c, c3 = %c\n", c1, c2, c3);
}
What would this program print for a file containing “abcde”?
Try online: https://csce313.schnekli.top/gform/fd-1
Answer: c1 = a, c2 = a, c3 = b
- fd1 reads 1 byte, but since fd2 is still pointing to the beginning of the file it will also read 'a'
- File pointer advanced when you read 1 byte from fd2, and fd3 is able to see it since fd2 and fd3 are basically the same, therefore fd3 reads 'b'.
File Descriptors practice - II
int main(int argc, char *argv[]) {
int fd1, fd2, fd3;
char *fname = argv[1];
fd1 = open(fname, O_CREAT|O_TRUNC|O_RDWR, S_IRUSR|S_IWUSR);
write(fd1, "pqrs", 4);
fd2 = open(fname, O_APPEND|O_WRONLY, 0);
write(fd2, "jklmn", 5);
fd3 = dup(fd1); /* Allocates descriptor */
write(fd3, "wxyz", 4);
write(fd2, "ef", 2);
// set fd1 to point to the beginning of the file
lseek (fd1, 0, SEEK_SET);
write (fd1, "ab", 2);
return 0;
}
- Note:
- S_IRUSR has R for reading
- S_IWUSR has W for writing
- fd1
- You start with an empty file
- If the file already existed, then it is truncated to 0
- fd1 ends up writing
pqrsinto the file
- fd2
- First seeks the end of the file before doing the write
- This time we give the O_APPEND option, so what previous writes are still there (write to the end of the file)
- At the end of that you get
pqrsjklmn
- fd3
- fd1 is pointing to the end of pqrs
- fd2 is pointing to the end of pqrsjkml
- Since fd3 is basically the same as fd1 then fd3 write to get:
pqrswxyzn
- fd2
- Write at its offfset to get
pqrswxyznef
- Write at its offfset to get
- fd1
- Finally fd1 sets the offset to 0 and adds ab to get:
abpqrswxyznef lseekis basically setting fd1 to the 0 offset.
- Finally fd1 sets the offset to 0 and adds ab to get:
- Note: fd1 and fd3 do not do deletions, they just do overwrites at whatever the offset is (wherever they left in).
- Note how we are Bit-ORing the arguments for Open (each of them is independent) -> it is a common way of passing multiple arguments into a single argument place.
Answer: abrswxyznef
File Descriptors and fork()
With fork(), child inherits content of parent’s address space, including its file descriptor table.
/CSCE-313/Lecture/Visual%20Aids/image-21.png)
File Descriptors and fork() - I
int main(void) {
char c;
int myfd=open("myf.txt",O_RDONLY);
fork();
read(myfd, &c, 1);
printf("Got %c\n", c);
}
/CSCE-313/Lecture/Visual%20Aids/image-2.png)
File Descriptors and fork() - II
int main(void) {
char c;
fork();
int myfd=open("myf.txt",O_RDONLY);
read(myfd, &c, 1);
printf("Got %c\n", c);
}
What would this program print for a file containing "abcde"?
/CSCE-313/Lecture/Visual%20Aids/image-1.png)
Answer: aa
- The trick is that the opening is happening after the
fork(), after the fork now you have two independent processes calling open. - You will read the first character of the file two times!
- If you had forked after the open, then the output would be
ab. Though we do not know which processes reads a or b. - Remember reads and writes are atomic
- "Operations will happen indivisibly"
- The kernel will guarantee atomicity
Example of duplicated FD's
Suppose the disk file foobar.txt consists of the six ASCII characters “foobar”. What is the output of the following program?
int main () {
char c;
int fd = open("foobar.txt", O_RDONLY);
if (fork() == 0) {
read(fd, &c, 1);
return 0;
} else {
wait(0);
read(fd, &c, 1);
cout << "c=" << c << endl;
}
}
Answer: The child inherits the parent’s descriptor table and both processes share the same file table. Thus the descriptor fd in both the parent and child points to the same open file table entry.
When the child reads the first byte of the file, the file position increments by 1. Thus the parent reads the second byte and output is “c=o”
Example - Descriptor Recycling
What is the output of the following program?
int main (){
int fd1 = open("foo.txt",O_RDONLY);
close(fd1);
int fd2 = open("baz.txt", O_RDONLY);
cout << "fd2 = " << fd2 << endl;
}
Unix processes begin life with open descriptors assigned to stdin (
The open function always returns the lowest unopened descriptor so the output will be "fd2 = 3"
Notes:
- Unix semantics guarantees that you will get an available file descriptor
Redirection
Standard I/O Functions
The C standard library (libc.so) contains a collection of higher-level standard I/O functions
- Documented in Appendix B of K&R.
Examples of standard I/O functions:
- Opening and closing files (
fopenandfclose) - Reading and writing bytes (
freadandfwrite) - Reading and writing text lines (
fgetsandfputs) - Formatted reading and writing (
fscanfandfprintf)
Notes:
- The big advantage of using standard IO is that it is buffered
- Little writes are accumulated
- It is a lot more efficient than invoking system calls directly
fprintf-> you can give a string in which you can put some formatting characters for convenience
Standard I/O models open files as streams
- Abstraction for a file descriptor and a buffer in memory.
C programs begin life with three open streams (defined in stdio.h)
stdin(standard input)stdout(standard output)stderr(standard error)
#include <stdio.h>
extern FILE *stdin; /* standard input (descriptor 0) */
extern FILE *stdout; /* standard output (descriptor 1) */
extern FILE *stderr; /* standard error (descriptor 2) */
int main() {
fprintf(stdout, "Hello, world\n");
}
Motivations
- Portability: Code is compiled to Windows-native or Linux-native code, not tied to a particular architecture.
- Buffering: For efficiency, considering device type
fprintftostdoutis flushed on '\n' character (but not to a file)fprintfto disk files is flushed after reaching file buffer size
/CSCE-313/Lecture/Visual%20Aids/image-3.png)
Notes:
- At some point it says "ok that's enough characters, lets flush!"
- It eventually has to flush (will never buffer forever)
fflushwill flush the buffer (accumulated data) for you and will actually write to the output direction- Your standard library will by default flush on
\ncharacters- Think that you need to be able to flush whenever you want to show the user what to input in the terminal
- If you are using
printfthen your write to the console is buffered (is saved in a buffer), if you do writes later that will also append to the buffer. When you do flush then is there when these written characters get displayed in the output device. - Generally buffer IO will be more efficient.
- There is no effect to functionality of a program, it is a closed process (buffering)
- It is originally implemented in the C compiler
- Standard I/O is generally what you have been using in all of your writes!
Buffering in Standard I/O
Standard I/O functions use buffered I/O
/CSCE-313/Lecture/Visual%20Aids/image-4.png)
Buffer flushed to output fd on "\n" or fflush() call
- You are free to flush on your own (using
ffflush) whenever you want! - Buffer size is optimized for your filesystem (i.e. doing 4K units since your page size is 4K)
Unix I/O vs. Standard I/O
Standard I/O is implemented using low-level Unix I/O
/CSCE-313/Lecture/Visual%20Aids/image-5.png)
Which ones should you use in your programs?
Pros and Cons of Standard I/O
Pros of Unix I/O over standard I/O
- It is the most general and lowest overhead form of I/O
- All other I/O packages are implemented using Unix I/O functions on a Unix system
- It provides functions for accessing file metadata, for e.g., with
stat()
Cons of Unix I/O over standard I/O
- Efficient data access requires some form of buffering, which can be
- Device dependent (e.g., a whole track of a disk can be read at once)
- Tricky and error prone
- Both these issues are addressed by standard I/O packages
Notes:
- The one big thing about the standard I/O is that it is required accross different implementations
Choosing I/O Functions
General rule: use the highest-level I/O functions you can
- Many C programmers are able to do all of their work using the standard I/O functions
Use standard I/O
- When working with disk or terminal files
Use raw Unix I/O
- Inside signal handlers, because Unix I/O is async-signal-safe
- In rare cases when you need absolute highest performance
- You do your own buffering depending on the application nature and knowledge about the underlying hardware
- On a 24 hard-drive (i.e., 48 TB cap) RAID system, we needed to exploit windows-native I/O to get nearly
read/write speed - Standard I/O would only give us
read/write speed