Post

CS2040S Notes - Chapter 1: Preliminaries

CS2040S Notes - Chapter 1: Preliminaries

Java Essentials

In CS2040S, the programming language used for teaching is Java. This is because of several factors:

  • Java has good and comprehensive support for Object-Oriented Programming (OOP);

  • Java has an extensive built-in utility library with ready-to-use implementations of most of the data structures and algorithms covered in this course;

  • Java is a popular language with relatively easy syntax.

To better prepare for the practical aspect of the course, we first briefly go through several essential features in Java.

Class and Object

A class is a template or “blueprint” with pre-defined information and data to specify the attributes (i.e., fields) and functionalities (i.e., methods) of a data type. It defines a thing by prescribing what it is and what it can do.

An object is a concrete instance of a class, i.e., a “thing” created according to the specifications defined by the class. A class can have an arbitrary number of instances.

A subclass or subtype is a class which builds on the structure of another class (known as the parent class or base class). Usually, a subclass will extend or modify the behaviour of the parent class.

A generic type is a class which can be parametrised with another type. This allows us to substitute any type of objects into the data types of fields or method arguments. For example, a List<T> allows us to define functionalities that work for a type parameter T which can be substituted with concrete types at run-time.

The this keyword is a special keyword in Java used to refer to the current instance. You can use it to differentiate between method arguments and class fields with identical names. For example:

1
2
3
4
5
6
7
8
class MyObject {
    int value = 5;

    public void test(int value) {
        System.out.println(value) // Refers to the argument
        System.out.println(this.value) // Refers to the "value" field of "this" object
    }
}

Accessibility

Fields and methods have accessibility. Under most of the situations, we only need to use the public, protected and private accessibilities. public fields and methods can be accessed from anywhere; protected fields and methods can be accessed only in a subclass of the class where the fields or methods are defined; private fields and methods can be accessed only in the same class as where they are defined.

Class Fields and Class Methods

The static keyword in Java allows you to define class fields and class methods.

A class field is an attribute shared by all instances of the class. For example, this is used for:

  • frequently used constants such as Math.PI or Integer.MAX_VALUE;
  • number of instances of a class when you need to track reference count.

A class method is a functionality which does not involve the use of data from any instance of the class.

Abstract Classes and Interfaces

An abstract class is a class which cannot be instantiated. It is used to provide a common template (i.e., a set of common base information) for many concrete subclasses which should be grouped together in order to avoid code duplication.

An interface is a set of abstract methods supplied to an implementing class. Classes implementing the same interface will have methods with the same name but varying behaviours.

Call by Reference

In Java, the types byte, short, int, long, float, double, boolean, char are known as primitive types. When a primitive type argument is passed from one function to another function, a local copy will be created. This means that the following will happen:

  • Create a primitive integer x in a function called f and initialise it to 5.
  • Call a function g(x) which changes the value of x to 10.
  • In f, the value of x will not change because only a copied value is passed to g.

Arrays and all object-derived types are reference types, which are passed by reference (i.e., the memory address of the instance is passed around between functions). This means that the following will happen:

  • Create an object wrapper Integer called x in a function called f and initialise it to 5.
  • Call a function g(x) which changes the value of x to 10.
  • In f, the value of x will change to 10 because both functions share the same memory address of the object x.

However, a memory address is passed by value. This means that if in the above example, we create a new instance in g and assign it to x, it will not affect the value of x in f because the memory address is replaced only in g.

Big-O Notation

One core consideration when designing a data structure or an algorithm is efficiency. However, we realise that absolute running time is not a fair metric for efficiency due to

  • hardware constraints;
  • difference between programming languages;
  • random fluctuations in running time;
  • different test cases used to measure running time.

Therefore, instead of absolute duration, we will estimate an upper bound for the number of steps required to complete an algorithm to measure its efficiency. Usually, for an input test case of size $n$, we denote this upper bound by $T(n)$.

To make everything rigorous, we will describe the efficiency of an algorithm using the following statement:

For any input of size $n$, the number of steps required for the algorithm to complete is at most a constant multiple of $T(n)$.

The efficiency defined as such is known as time complexity.

Algorithms with the same time complexity might still have significant difference in running time in practice! For example, even if the running time is linear in test case size for both algorithms, the actual durations could be drastically different for large test case sizes due to different gradients.

In this sense, time complexity measures the growth rate in terms of its order of magnitude and it does not care about the absolute running time nor the absolute value of the running time’s gradient.

The formal mathematical definition for this is as follows:

Let $f$ be a real- or complex-valued function and let $g$ be a real-valued function. We say that $f(x)$ is $\mathcal{O}\bigl(g(x)\bigr)$ if there exists some $x_0 \in \mathbb{R}$ and $M > 0$ such that \(\left\lvert f(x) \right\rvert \leq Mg(x)\) for all $x > x_0$.

Usually we write $f(x) = \mathcal{O}\bigl(g(x)\bigr)$ if the above holds.

This is actually an abuse of notation, because this “equality” is not symmetric.

The “correct” notation should be $f \in \mathcal{O}(g)$ where $\mathcal{O}(g)$ refers to the set of all functions whose growth rates are not faster than $g$.

Sometimes we also encounter the term “tightest upper bound” for time complexity. If $g$ is the tightest upper bound for $f$, this just means that $f(x) = \mathcal{O}\bigl(g(x)\bigr)$ and $g(x) = \mathcal{O}\bigl(f(x)\bigr)$, i.e., they have the same order of growth.

This post is licensed under CC BY 4.0 by the author.