06 - Databases
Class: CSCE-331
Notes:
Chapter 1: Before the Advent of Database Systems
The way in which computers manage data has come a long way over the last few decades. Today's users take for granted the many benefits found in a database system. However, it wasn't that long ago that computers relied on a much less elegant and costly approach to data management called the file-based system.
File-based System
One way to keep information on a computer is to store it in permanent files. A company system has a number of application programs; each of them is designed to manipulate data files. These application programs have been written at the request of the users in the organization. New applications are added to the system as the need arises. The system just described is called the file-based system.
Consider a traditional banking system that uses the file-based system to manage the organization's. There are different departments in the bank. Each has its own applications that manage and manipulate different data files. For banking systems, the programs may be used to debit or credit an account, find the balance of an account, add a new mortgage loan and generate monthly statements.
Disadvantages of the file-based approach
Using the file-based system to keep organizational information has a number of disadvantages. Listed below are five examples.
Data redundancy
Often, within an organization, files and applications are created by different programmers from various departments over long periods of time. This can lead to data redundancy, a situation that occurs in a database when a field needs to be updated in more than one table. This practice can lead to several problems such as:
- Inconsistency in data format
- The same information being kept in several different places (files)
- Data inconsistency, a situation where various copies of the same data are conflicting, wastes storage space and duplicates effort
Data isolation
Data isolation is a property that determines when and how changes made by one operation become visible to other concurrent users and systems. This issue occurs in a concurrency situation. This is a problem because:
- It is difficult for new applications to retrieve the appropriate data, which might be stored in various files.
Integrity problems
Problems with data integrity is another disadvantage of using a file-based system. It refers to the maintenance and assurance that the data in a database are correct and consistent. Factors to consider when addressing this issue are:
- Data values must satisfy certain consistency constraints that are specified in the application programs.
- It is difficult to make changes to the application programs in order to enforce new constraints.
Security problems
Security can be a problem with a file-based approach because:
- There are constraints regarding accessing privileges.
- Application requirements are added to the system in an ad-hoc manner so it is difficult to enforce constraints.
Concurrency access
Concurrency is the ability of the database to allow multiple users access to the same record without adversely affecting transaction processing. A file-based system must manage, or prevent, concurrency by the application programs. Typically, in a file-based system, when an application opens a file, that file is locked. This means that no one else has access to the file at the same time.
In database systems, concurrency is managed thus allowing multiple users access to the same record. This is an important difference between database and file-based systems.
Database Approach
The difficulties that arise from using the file-based system have prompted the development of a new approach in managing large amounts of organizational information called the database approach.
Databases and database technology play an important role in most areas where computers are used, including business, education and medicine. To understand the fundamentals of database systems, we will start by introducing some basic concepts in this area.
Role of databases in business
Everybody uses a database in some way, even if it is just to store information about their friends and family. That data might be written down or stored in a computer by using a word-processing program or it could be saved in a spreadsheet. However, the best way to store data is by using database management software. This is a powerful software tool that allows you to store, manipulate and retrieve data in a variety of different ways.
Most companies keep track of customer information by storing it in a database. This data may include customers, employees, products, orders or anything else that assists the business with its operations.
The meaning of data
Data are factual information such as measurements or statistics about objects and concepts. We use data for discussions or as part of a calculation. Data can be a person, a place, an event, an action or any one of a number of things. A single fact is an element of data, or a data element.
If data are information and information is what we are in the business of working with, you can start to see where you might be storing it. Data can be stored in:
- Filing cabinets
- Spreadsheets
- Folders
- Ledgers
- Lists
- Piles of papers on your desk
All of these items store information, and so too does a database. Because of the mechanical nature of databases, they have terrific power to manage and process the information they hold. This can make the information they house much more useful for your work.
With this understanding of data, we can start to see how a tool with the capacity to store a collection of data and organize it, conduct a rapid search, retrieve and process, might make a difference to how we can use data. This book and the chapters that follow are all about managing information.
Key Terms
concurrency: the ability of the database to allow multiple users access to the same record without adversely affecting transaction processing
data element: a single fact or piece of information
data inconsistency: a situation where various copies of the same data are conflicting
data isolation: a property that determines when and how changes made by one operation become visible to other concurrent users and systems
data integrity: refers to the maintenance and assurance that the data in a database are correct and consistent data redundancy: a situation that occurs in a database when a field needs to be updated in more than one table
database approach: allows the management of large amounts of organizational information database management software: a powerful software tool that allows you to store, manipulate and retrieve data in a variety of ways
file-based system: an application program designed to manipulate data files
Chapter 2: Fundamental Concepts
What Is a Database?
A database is a shared collection of related data used to support the activities of a particular organization. A database can be viewed as a repository of data that is defined once and then accessed by various users as shown in Figure 2.1.
A database is a repository of data
Database Properties
A database has the following properties:
- It is a representation of some aspect of the real world or a collection of data elements (facts) representing real-world information.
- A database is logical, coherent and internally consistent.
- A database is designed, built and populated with data for a specific purpose.
- Each data item is stored in a field.
- A combination of fields makes up a table. For example, each field in an employee table contains data about an individual employee.
A database can contain many tables. For example, a membership system may contain an address table and an individual member table as shown in Figure 2.2. Members of Science World are individuals, group homes, businesses and corporations who have an active membership to Science World. Memberships can be purchased for a one- or two-year period, and then renewed for another one- or two-year period.
/CSCE-331/Readings/Visual%20Aids/Pasted%20image%2020260201142004.png)
Figure 2.2. Membership system at Science World by N. Eng.
Database Management System
A database management system (DBMS) is a collection of programs that enables users to create and maintain databases and control all access to them. The primary goal of a DBMS is to provide an environment that is both convenient and efficient for users to retrieve and store information.
/CSCE-331/Readings/Visual%20Aids/Pasted%20image%2020260201142342.png)
Key Terms
data elements: facts that represent real-world information
database: a shared collection of related data used to support the activities of a particular organization
database management system (DBMS): a collection of programs that enables users to create and maintain databases and control all access to them
table: a combination of fields
Chapter 3: Characteristics and Benefits of a Database
Managing information means taking care of it so that it works for us and is useful for the tasks we perform. By using a DBMS, the information we collect and add to its database is no longer subject to accidental disorganization. It becomes more accessible and integrated with the rest of our work. Managing information using a database allows us to become strategic users of the data we have.
We often need to access and re-sort data for various uses. These may include:
- Creating mailing lists
- Writing management reports
- Generating lists of selected news stories
- Identifying various client needs
The processing power of a database allows it to manipulate the data it houses, so it can:
- Sort
- Match
- Link
- Aggregate
- Skip fields
- Calculate
- Arrange
Because of the versatility of databases, we find them powering all sorts of projects. A database can be linked to:
- A website that is capturing registered users
- A client-tracking application for social service organizations
- A medical record system for a health care facility
- Your personal address book in your email client
- A collection of word-processed documents
- A system that issues airline reservations
Characteristics and Benefits of a Database
There are a number of characteristics that distinguish the database approach from the file-based system or approach. This chapter describes the benefits (and features) of the database system.
Self-describing nature of a database system
A database system is referred to as self-describing because it not only contains the database itself, but also metadata which defines and describes the data and relationships between tables in the database. This information is used by the DBMS software or database users if needed. This separation of data and information about the data makes a database system totally different from the traditional file-based system in which the data definition is part of the application programs.
Insulation between program and data
In the file-based system, the structure of the data files is defined in the application programs so if a user wants to change the structure of a file, all the programs that access that file might need to be changed as well.
On the other hand, in the database approach, the data structure is stored in the system catalogue and not in the programs. Therefore, one change is all that is needed to change the structure of a file. This insulation between the programs and data is also called program-data independence.
Support for multiple views of data
A database supports multiple views of data. A view is a subset of the database, which is defined and dedicated for particular users of the system. Multiple users in the system might have different views of the system. Each view might contain only the data of interest to a user or group of users.
Sharing of data and multiuser system
Current database systems are designed for multiple users. That is, they allow many users to access the same database at the same time. This access is achieved through features called concurrency control strategies. These strategies ensure that the data accessed are always correct and that data integrity is maintained.
The design of modern multiuser database systems is a great improvement from those in the past which restricted usage to one person at a time.
Control of data redundancy
In the database approach, ideally, each data item is stored in only one place in the database. In some cases, data redundancy still exists to improve system performance, but such redundancy is controlled by application programming and kept to minimum by introducing as little redundancy as possible when designing the database.
Data sharing
The integration of all the data, for an organization, within a database system has many advantages. First, it allows for data sharing among employees and others who have access to the system. Second, it gives users the ability to generate more information from a given amount of data than would be possible without the integration.
Enforcement of integrity constraints
Database management systems must provide the ability to define and enforce certain constraints to ensure that users enter valid information and maintain data integrity. A database constraint is a restriction or rule that dictates what can be entered or edited in a table such as a postal code using a certain format or adding a valid city in the City field.
There are many types of database constraints. Data type, for example, determines the sort of data permitted in a field, for example numbers only. Data uniqueness such as the primary key ensures that no duplicates are entered. Constraints can be simple (field based) or complex (programming).
Restriction of unauthorized access
Not all users of a database system will have the same accessing privileges. For example, one user might have read-only access (i.e., the ability to read a file but not make changes), while another might have read and write privileges, which is the ability to both read and modify a file. For this reason, a database management system should provide a security subsystem to create and control different types of user accounts and restrict unauthorized access.
Data independence
Another advantage of a database management system is how it allows for data independence. In other words, the system data descriptions or data describing data (metadata) are separated from the application programs. This is possible because changes to the data structure are handled by the database management system and are not embedded in the program itself.
Transaction processing
A database management system must include concurrency control subsystems. This feature ensures that data remains consistent and valid during transaction processing even if several users update the same information.
Provision for multiple views of data
By its very nature, a DBMS permits many users to have access to its database either individually or simultaneously. It is not important for users to be aware of how and where the data they access is stored
Backup and recovery facilities
Backup and recovery are methods that allow you to protect your data from loss. The database system provides a separate process, from that of a network backup, for backing up and recovering data. If a hard drive fails and the database stored on the hard drive is not accessible, the only way to recover the database is from a backup. If a computer system fails in the middle of a complex update process, the recovery subsystem is responsible for making sure that the database is restored to its original state. These are two more benefits of a database management system.
Key Terms
concurrency control strategies: features of a database that allow several users access to the same data item at the same time
data type: determines the sort of data permitted in a field, for example numbers only
data uniqueness: ensures that no duplicates are entered
database constraint: a restriction that determines what is allowed to be entered or edited in a table
metadata: defines and describes the data and relationshipsbetween tables in the database
read and write privileges: the ability to both read and modify a file
read-only access: the ability to read a file but not make changes
self-describing: a database system is referred to as self-describing because it not only contains the database itself, but also metadata which defines and describes the data and relationships between tables in the database
view: a subset of the database
Database Introduction (Video 1)
What do databases do?
- Store and organize data so that the information you want is easy to find
- You use them all the time, possibly without knowing
- YouTube
- Amazon
- ebay, etc.
Why database systems?
- To manage (and query) very large amounts of data
- Most of these data do not fit in RAM
- Key considerations:
- Efficiency
- How fast it is to query that database
- Safe concurrency
- Multiple transactions happen at the same time
- Reliability
- They are not useful if they store the data one day and when later you come to check it out it does not exist anymore
- Ease of access
- Access process should be very straight forward
- Efficiency
Why should you care about databases?
"Data is the new oil. It's valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value."
Clive Humby, UK Mathematician and architect of Tesco's Clubcard
Not everyone agrees, but most agree that without it, our economies and lifestyles would change dramatically
What is SQL? (Video 2)
SQL = Structured Query Language
- SQL is a Language that Communicates with Databases
How does SQL work?
- Think of a database like a warehouse with data tables like files cabinets and data like files
- This warehouse stores data
- The warehouse database was built using coding languages like C, C++, or Java.
How do people access the files?
- Build a "Storefront" App
- This takes a lot of time and money
- This is like building a whole app, just to access the files of a database
- Hire a Delivery Translator (SQL)
- It knows how to talk to the database
- You can ask it to get files for you
Example:
SELECT name FROM lego_people;
- Tabled Called = lego_people
- Will return the name column from this table
SELECT name FROM lego_height WHERE cm > 3;
- Will return the name column of the entries in the lego_height that comply with that condition
Example with two separate tables:
SELECT name, age FROM people LEFT JOIN lego_height USING (name);
- What both two tables have in common (the names column) was used to join both the two by matching by the names on the table
Add data?
Example to add data:
INSERT INTO lego_people(name, age) VALUES ('Joe', 12);
Update data:
UPDATE lego_people SET name = 'Joe', age = 13);
Delete data:
DELETE FROM lego_people WHERE name = 'Joe';
Remember
- SQL is a language that you should learn regardless of domain
- Most companies have an online presence and keep records of their data
- Knowing how to get data Means you can use more data in your job
- The way SQL Commands are Written Varies Slightly
- There are different SQL syntaxes
- SQL is a language that speaks only yo Relational Databases (for the most part)
- A database that has a tabular scheme (rows and columns)
- Can use one column to relate to different tables
Entity Relationship Diagrams (Video 3)
Database = collection of information that is organized so data can be easily stored, managed, updated, and retrieved.
EDRs
- Engineers need a visual way to understand how all the separate elements are related to each other and how they are working together, to show this they build Entity Relationship Diagrams (EDRs)
How to make EDRs
- You can draw them out with pan and paper, but it it's going to be way easier for you to use a diagramming tool.
- Lucidchart (diagramming tool)
- Each entity will have attributes, which are various properties or traits
- Example: Customer
- Customer_ID
- Name
- City
- Phone
- Example: Customer
- The entities in your database will be the rows, and the attributes will be the columns
- You draw lines between entities, and these lines can have different cardinality
/CSCE-331/Readings/Visual%20Aids/Pasted%20image%2020260201152120.png)
Example (customers and orders)
- Relationship between a customer and an order
- What is the minimum number of orders that a customer could have?
- 0
- What is the maximum number of orders that a customer could have?
- infinite
- This requires a zero or many relationship
- What is the minimum number of orders that a customer could have?
- Relationship between orders and customers
- What is the minimum amount of customers an order could have?
- 1
- What is the maximum amount of customers an order could have?
- 1
- This requires a one (and only one) relationship
- What is the minimum amount of customers an order could have?
Example (orders and products)
- Relationship between orders and products
- A certain order can have how many products?
- For an order to exist it has to have one product, but a lot of different products can be comprised of that order
- To show that we use the one or many relationship
- A certain order can have how many products?
- Relationship between products and orders?
- A product can be part of how many orders?
- A product could be a part of no orders, but it also could be a product of many orders
- So we use the zero or many relationship
- A product can be part of how many orders?
Example (ERD)
/CSCE-331/Readings/Visual%20Aids/Pasted%20image%2020260201152932.png)