Java Collections

Java Collections

Java Collections

Java has been my primary programming language for almost 7+ years now. Out of which I spent 4 years building college projects and 3+ years building Enterprise-grade applications at my full-time job. Java might be a verbose language but it is one of the most mature programming languages of all time. One of the main libraries that come built-in with the JDK is the Java Collections Framework. As an experienced Software Engineer, I cannot imagine a day without having used the Java Collections API.

What is a Collection?

In layman terms, a Collection represents a group of objects. Ex. Stamp collection, Music Playlist, etc. Similarly in Java, you can think of a Collection as a means to store your data. A container that can hold data in some format, you can retrieve it in different ways, and perform operations on that data depending on the type of container.
The figure below gives a high-level overview of the Collections Framework implementation in Java.

Java Collections

Java Collections Framework - Interfaces and Implementations

It starts with the Collection Interface. The Collection Interface declares a bunch of utility methods common across all the Java Data Structures implementing it example ArrayList, HashSet and others. The collections interface makes it easier to pass around the above-mentioned collection of objects as needed. For example, it provides a constructor for each Data Structure to work with the generic collection and perform conversions.

Assume we define a generic collection that holds String objects.
Ex. Collection<String> newCollection

Now let's assume we need in the ArrayList form to perform List specific operations on it. We do the conversion by passing the newCollection:
List<String> newList = new ArrayList<>(newCollection)

Some other common operations possible with Collection are:

  • int size() : Get the size of the Collection

  • boolean isEmpty() : Returns if the Collection has no elements

  • boolean contains(Object element) : Check if the Collection contains an element

  • boolean add(E element) : Insert or add an element to the collection

  • boolean remove(Object element) : Remove or delete the element from the collection

  • Iterator<E> iterator() : Defines an Iterator that can iterate over elements of the collection.

Collection interface is inherited by the following:
1. List Interface
2. Queue Interface
3. Set Interface
4. Map Interface

This means the above interfaces not only inherit functionality from the Collection interface but each of them supports additional API / functionality / methods of their own. Usually, the best way to refer to it is by navigating and looking at the interface in an IDE or simply by referring to the official Java documentation.

1. List Interface

There are two implementations we can use: `ArrayList` and `LinkedList`. A List allows duplicate elements and performs iteration in the order of insertion.

ArrayList

  • It is based on Arrays so the reads are very fast.

  • Add and Remove element operations are expensive because each time it is modified it require rebuilding the underlying array structure.

LinkedList

  • The Add and Remove operations are efficient compared to ArrayList

  • But, the read operation from a random position requires iterating through the LinkedList to find it. This is expensive.

2. Queue Interface

By definition, a Queue follows the FIFO (First-In-First-Out) order. It supports the two main operations: .offer() (add) and .poll() (remove)

There are two implementations we can use: PriorityQueue and LinkedList

PriorityQueue (Min Heap & Max Heap)

  • Priority Queue in Java is mainly a representation of the Heap Data-Structure.

  • The elements inserted in the PriorityQueue(PQ) are ordered based on a prioritization parameter.

  • For example, if we are inserting String objects in the PQ, we can use the String length as the prioritization parameter. Meaning when we poll() the queue, we can configure it to always return the shortest String.

LinkedList

  • As discussed above in the List Interface section.

  • LinkedList basically implements both List and Queue Interfaces.

3. Set Interface

Unlike List, a Set cannot hold duplicate objects or elements. There are three implementations of a Set: HashSet, LinkedHashSet and TreeSet

HashSet

  • In most cases this is the default implementation of HashSet

  • It offers a significantly quick look up of any element in the HashSet.

  • When iterating over the HashSet, the elements will most likely not retain the original insertion order.

LinkedHashSet (Ordered Set)

  • When iterating it will maintain and return elements in their original order of insertion.

  • Look-up might be little expensive compared to HashSet

TreeSet (Sorted Set)

  • It constatnly keeps the elements sorted in the specified order using a Comparator.

  • Again, the look-up might be little expensive compared to HashSet

4. Map Interface

Unlike all the collections above, this Collection operates on two entities. The Map interface maintains a set of Key:Value pairs. There are three map implementations: HashMap, LinkedHashMap and TreeMap

HashMap

  • It is the simplest form of using a Map Data-Structure in Java.

  • HashMap doen't maintain the insertion order of the key-value pair in to the map.

LinkedHashMap (Ordered Map)

  • Unlike HashMap, this maintains the insertion order of the key-value pair when iterating.

TreeMap (Sorted Map)

  • Similar to TreeSet, the Map Entries are sorted in a specified order.

Conclusion

As we saw above, the Java collections Framework gives use the most common Data-Structures that a developer might need. Once you understand the components in this framework, you can easily compare and use the most appropriate data structure for the problem at hand.

Today, there are two more Open Source libraries available from Google and Apache that build upon the Java Collections to provide additional data structures and utility methods.

Guava: Google Core Libraries for Java

Guava is a set of core Java libraries from Google that includes new collection types (such as multimap and multiset), immutable collections, a graph library, and utilities for concurrency, I/O, hashing, caching, primitives, strings, and more! It is widely used on most Java projects within Google, and widely used by many other companies as well." This project was originally called "Google Collections"

Apache Commons Collections

The Java Collections Framework was a major addition in JDK 1.2. It added many powerful data structures that accelerate development of most significant Java applications. Since that time it has become the recognised standard for collection handling in Java. Commons-Collections seek to build upon the JDK classes by providing new interfaces, implementations and utilities."

The Commons Collections and Guava library are widely used in Enterprise-grade applications today as the enhancements they offer come in handy to a lot of Software Engineer building Java applications and services.

If Java is your primary language, then Java Collections Framework is a pre-requisite for the Coding Interview Patterns - Data Structures & Algorithms Course that is currently under development. If you want to stay updated about more informative posts like this sign up to TechBum's Newsletter.

You can follow me on Twitter and email me at Techbum.Labs@gmail.com for any questions and feedback.

Happy Learning! :)