CS2040S Notes - Chapter 1: Preliminaries
Java Essentials
In CS2040S, the programming language used for teaching is Java. This is because of several factors:
Java has good and comprehensive support for Object-Oriented Programming (OOP);
Java has an extensive built-in utility library with ready-to-use implementations of most of the data structures and algorithms covered in this course;
Java is a popular language with relatively easy syntax.
To better prepare for the practical aspect of the course, we first briefly go through several essential features in Java.
Class and Object
A class is a template or “blueprint” with pre-defined information and data to specify the attributes (i.e., fields) and functionalities (i.e., methods) of a data type. It defines a thing by prescribing what it is and what it can do.
An object is a concrete instance of a class, i.e., a “thing” created according to the specifications defined by the class. A class can have an arbitrary number of instances.
A subclass or subtype is a class which builds on the structure of another class (known as the parent class or base class). Usually, a subclass will extend or modify the behaviour of the parent class.
A generic type is a class which can be parametrised with another type. This allows us to substitute any type of objects into the data types of fields or method arguments. For example, a List<T>
allows us to define functionalities that work for a type parameter T
which can be substituted with concrete types at run-time.
The this
keyword is a special keyword in Java used to refer to the current instance. You can use it to differentiate between method arguments and class fields with identical names. For example:
1
2
3
4
5
6
7
8
class MyObject {
int value = 5;
public void test(int value) {
System.out.println(value) // Refers to the argument
System.out.println(this.value) // Refers to the "value" field of "this" object
}
}
Accessibility
Fields and methods have accessibility. Under most of the situations, we only need to use the public
, protected
and private
accessibilities. public
fields and methods can be accessed from anywhere; protected
fields and methods can be accessed only in a subclass of the class where the fields or methods are defined; private
fields and methods can be accessed only in the same class as where they are defined.
Class Fields and Class Methods
The static
keyword in Java allows you to define class fields and class methods.
A class field is an attribute shared by all instances of the class. For example, this is used for:
- frequently used constants such as
Math.PI
orInteger.MAX_VALUE
; - number of instances of a class when you need to track reference count.
A class method is a functionality which does not involve the use of data from any instance of the class.
Abstract Classes and Interfaces
An abstract class is a class which cannot be instantiated. It is used to provide a common template (i.e., a set of common base information) for many concrete subclasses which should be grouped together in order to avoid code duplication.
An interface is a set of abstract methods supplied to an implementing class. Classes implementing the same interface will have methods with the same name but varying behaviours.
Call by Reference
In Java, the types byte
, short
, int
, long
, float
, double
, boolean
, char
are known as primitive types. When a primitive type argument is passed from one function to another function, a local copy will be created. This means that the following will happen:
- Create a primitive integer
x
in a function calledf
and initialise it to 5. - Call a function
g(x)
which changes the value ofx
to 10. - In
f
, the value ofx
will not change because only a copied value is passed tog
.
Arrays and all object-derived types are reference types, which are passed by reference (i.e., the memory address of the instance is passed around between functions). This means that the following will happen:
- Create an object wrapper
Integer
calledx
in a function calledf
and initialise it to 5. - Call a function
g(x)
which changes the value ofx
to 10. - In
f
, the value ofx
will change to 10 because both functions share the same memory address of the objectx
.
However, a memory address is passed by value. This means that if in the above example, we create a new instance in g
and assign it to x
, it will not affect the value of x
in f
because the memory address is replaced only in g
.
Big-O Notation
One core consideration when designing a data structure or an algorithm is efficiency. However, we realise that absolute running time is not a fair metric for efficiency due to
- hardware constraints;
- difference between programming languages;
- random fluctuations in running time;
- different test cases used to measure running time.
Therefore, instead of absolute duration, we will estimate an upper bound for the number of steps required to complete an algorithm to measure its efficiency. Usually, for an input test case of size $n$, we denote this upper bound by $T(n)$.
To make everything rigorous, we will describe the efficiency of an algorithm using the following statement:
For any input of size $n$, the number of steps required for the algorithm to complete is at most a constant multiple of $T(n)$.
The efficiency defined as such is known as time complexity.
Algorithms with the same time complexity might still have significant difference in running time in practice! For example, even if the running time is linear in test case size for both algorithms, the actual durations could be drastically different for large test case sizes due to different gradients.
In this sense, time complexity measures the growth rate in terms of its order of magnitude and it does not care about the absolute running time nor the absolute value of the running time’s gradient.
The formal mathematical definition for this is as follows:
Let $f$ be a real- or complex-valued function and let $g$ be a real-valued function. We say that $f(x)$ is $\mathcal{O}\bigl(g(x)\bigr)$ if there exists some $x_0 \in \mathbb{R}$ and $M > 0$ such that \(\left\lvert f(x) \right\rvert \leq Mg(x)\) for all $x > x_0$.
Usually we write $f(x) = \mathcal{O}\bigl(g(x)\bigr)$ if the above holds.
This is actually an abuse of notation, because this “equality” is not symmetric.
The “correct” notation should be $f \in \mathcal{O}(g)$ where $\mathcal{O}(g)$ refers to the set of all functions whose growth rates are not faster than $g$.
Sometimes we also encounter the term “tightest upper bound” for time complexity. If $g$ is the tightest upper bound for $f$, this just means that $f(x) = \mathcal{O}\bigl(g(x)\bigr)$ and $g(x) = \mathcal{O}\bigl(f(x)\bigr)$, i.e., they have the same order of growth.