An Object is a Data Structure
Objects are data structures. We all know programs have data, in fact in OOP an object is data and related program logic packaged into this thing called an object. In Java, we have primitives byte, short, int, long, float, double, boolean, char. A String is an object made of the primitive char. But in general, all objects can be broken down into these primitives. Of course, all primitives are made of bytes. So you can say the most basic data type is the byte. And the most basic data structure is a series of bytes. Or an array of bytes. Therefor the most basic collection is the array.
Primitives are the most basic data structures made from bits and bytes.
- The bit (on or off, 0 or 1, true or false)
- boolean actually its stored as around 1 byte in the JVM, but its JVM dependent.
- BitSet is a class that stores and handles bits in bit size memory allocations.
- The Byte (8 bits)
- byte 1 byte signed whole -128 to 127 byte times 0xff will convert it to 0 to 255
- short 2 bytes signed whole -32,768 to 32,767
- int 4 bytes signed whole -2,147,483,648 to 2,147,483,647 (2.1Billion)
- long 8 bytes signed whole -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 (9Million-Trillion)
- float 4 bytes signed real 6-7 significant digits ±3.40282347E+38F
- double 8 bytes signed real 15 significant digits ±1.79769313486231570E+308
- char 2 bytes whole unsigned 0 to 65535 (for unicode characters) ASCII still fits in first 256 or 0 to 255 space.
- String a most common object that holds any number of char’s.
- reference 4 bytes on 32 bit JVM, 8 bytes on 64 bit JVM so 8 bytes for most of us.
The above article also talks about overflow or underflow where you assign values larger than the primitive can contain or smaller than they can contain. Autoboxing is where a primitive can be treated like an object and have methods called on it. i.e. ‘Z’.charValue();
If you want to see the size of something in Java, just use the Runtime object methods to determine free ram, make a bunch of that one thing or object, then subtract. Divide by the number of things or objects and it will give a rough estimate of its size. Java Language probably didn’t provide a sizeof operator because it wouldn’t be pretty. If you want to check the execution time of something you may do a similar feat. Just get the System.currentTimeMillis(); At the start of a method call, then again at the end of it. Subtract. You might even run it a hundred times then divide by 100 to get an average execution time.
The above article has an interesting chart showing types of data structures. It fails to show Vector and Sequences and some Java primitives.
There are several methods used to determine the efficiency of algorithms. Big-O notation being the main one that describes worst case time or space complexity. I will not explain this in this article. I will simply tell you about them and you can research it for yourself. There are the best case, average case, and worse case analysis. Also, IDE’s come with profiler tools. These tools can give you information on execution times for various parts of your programs. This can help to reveal “Bottlenecks” in performance. As a newbie, you may not need to be too worried about such issues. Java language Collections API will already be optimized for most of your needs. Just remember the following above list. Performance is not usually the first consideration when developing apps.
Objects and Arrays
So an object is made up of a list or collection of these values be it primitive or reference. This makes an object something like a data record. A data record is a list of name:value pairs in a given order. The data in an object is called fields and methods are basically a reference to a function which we learned was an 8 byte data type. The constructors are “constructor methods”. So yes I suppose they are references too. An inner class is simply a type and would not add to the data for the given class. They are a class (object) with their own data. I sometimes say class and sometimes say object. Technically an object is an instance or copy (clone) of the class.
A class is a blueprint for the objects. Any static fields and methods or other members belong to the class. Constructors cannot be static and are implicitly final. So the class has its set of name:value pairs also. But the distinction is that the class (static) members are not copied. You can make as many objects as you want from the class definition. In short this is what makes an object like a record in a database table. Make sense?
And an Array is a linear contiguous list of whatever data type you make an array of. Each item in this list is an element and has a position number or rank. An Array has elements numbered 0 to 99. If you recall computers count starting at 0. An array of a primitive is simply a list or sequence of numbers. An Array of an object of some kind is like a database table. Again each object is then the record in the table. This is to say that the two most basic data structures are arrays and objects.
Files and Streams
- local storage
- remote storage
- archive storage
- from input or to output
- from memory to memory
- storage to memory or memory to storage
A file on disk might also be considered to be a data structure. It is in its most basic form a series of bytes. As the bytes are read or written from memory, they are converted to and from primitives or objects. A line of text from a file is the object String. A file can be read as a series of bytes, or characters. And with utility classes other primitives and objects as well. IO Streams can be read and written as bytes, primitives, objects as well.
But a stream is not a data structure. A stream is just a flow of data. You might say that data structures are stored in memory, passed through streams and stored in files on storage devices. The streams connect files with memory or computers with computers. So computer to computer is like a memory to memory stream. Streams may also come from input devices or go to output devices etc.
More complex data structures
- storing and retrieving
- basic structures
- Maps (Hash Table)
- Heaps (Priority Queue)
- Dictionaries (Hash Table)
- basic methods
- random access
- basic structures
Find used Textbooks
Data structures become quite a bit more complex than what we have talked about thus far. A typical semester college course will cover them. And if you look you can find used textbooks everywhere about data structures. Look for Java Collections and Data Structures books. So what do these books cover? Basically, storage and retrieval, sorting and searching. As it turns out there are simple and quick to implement methods and more complex methods and difficult to implement methods. These methods have been pretty well standardized and are called algorithms. The name of the algorithm is sometimes named after the guy or gal who invented it. Or in the case of Bubble Sort, named based on how it works. All of these algorithms are used in varied situations throughout the computer world and even as a common computer user you see them in practical use every day.
Much can be said about Data Structures and Collections. I am going to be as brief as possible in this article, mostly listing and defining things for you. You can almost think of data structures as patterns for the way data has been kept for particular uses. These patterns sometimes seem like things we see in nature, a stream (List), branches (Tree), a web (Graph? Should have been Network) for example. So structures can have similar sounding names. Sometimes developers seem to misname things. Once the name is stuck, its there for good as if written in stone. For example, the Graph structure, in my opinion, would have been better called a Webb or a Net. Heck, a 2-dimensional array is more like a graph. I hereby and from now on declare 2D arrays to be called Graphs and 3D arrays Cubes!
A 4D array a MultiVerse! A 5D Array is a Twilight Zone! Naw… Let’s not overdo it.
Synchronized vs Not Synchronized
One thing I want to note is that Java had collection classes originally that were synchronized. This means thread safe. There was a certain inefficiency in having any class to be synchronized so they decided to make new classes that were not synchronized. This is why we have Vector class vs ArrayList class. Vector is synchronized.
- Multi-Dimensional Arrays (In Java these are Arrays of Arrays)
- One Dimensional
- Two Dimensional
- Three Dimensional
- Arrays of primitives.
- Arrays of Objects.
- The Arrays Class
- The ArrayList Class
- Treating a Random Access File like a storage-based array.
- Stacks and Queues using Arrays
- Multi-Dimensional Arrays (In Java these are Arrays of Arrays)
Using Arrays and Array-based objects is where you will start in programming data structures generally. One thing you will note is that you can use Arrays for stacks and queues, or linked list for stacks and queues. This is data structures within data structures more or less. You can also form trees within arrays. For example, you can do quicksort using trees in arrays or a dynamic tree structure. Same goes for other searching and sorting techniques. The Arrays class has some nice methods for search and sorting arrays. The ArrayList class is a wrapper for an array. The ArrayList class has some handy methods for things you might normally have to code yourself, such as insert and delete. It will also give you a ListIterator for processing the array items sequentially.
- Comparable interface
- compareTo() method
Iterator vs Loops
Iterator is something you will need to look into, its used much with List. It allows you to step through a list with hasMore() and next() methods. Otherwise, you would use a for loop to do the same. Also for sorting you will need to look into Comparable Interface and compareTo() method. Objects you want to sort will need to implement this interface.
- ArrayList class
- Vector class
- LinkedList class
- Double Linked List
- Circularly Linked List
- Sorted Linked List
- Stack class
- Double Ended Queues
- Priority Queues
Lists, Queue, Stack
Lists are a series of some kind of data. The ArrayList and Vector above wrap an Array ( Static Data Structure) Whereas, Linked data structures are dynamic meaning they grow and shrink as needed. Sequences are ordered queues. A queue is a first in first out list (FIFO). Examples of a queue are keyboard buffer queue. A production queue worklist in a video game. A priority queue is like multiple queues where each priority level has its own queue, higher priority queues are worked first.
A stack is a last in first out list (LIFO). Imagine a stack of papers on a desk where you routinely add new things to the top of it. But you only work whats on top first. So you may work the stack down to half its size, or one day get busy and work it down to the bottom. I used a stack for a type of random surface generator a few years back on the Java Games and Graphics project, actually I did this many years ago with Pascal too. It generates a 3D surface that looks like a volcanic island.
Another example of a stack is the call trace stack that Java generates when exceptions are thrown. Heaps are randomly stored data of random sizes. Heaps are not like memory heaps. Heaps are balanced trees actually with largest value at top descending to least values at bottom or vice versa. I also think of heaps as piles or mounds.
- Hash Tables
- HashMap class
- HashSet class
- Hashtable class
- LinkedHashMap class
- LinkedHashSet class
- WeakHashMap class
Hash Tables and Sets
Hash tables are lookup tables that store using key:value pairs. The way they work makes them very fast and efficient at storing and looking something up. You will see more data structures on data structures in the above. Maps, List, Sets, Linked etc.
Map, Dictionary, Hashtable all almost the same thing. Store using key:value pairs. Keys generally have to be unique.
- TreeSet class
- HashSet class
- LinkedHashSet class
- EnumSet class
Sets contain no duplicate elements. No objects who are equal when using the equals() method and only one null value at most. A set is really not as much of a data structure as a data requirement or restriction.
- General Trees
- Binary Trees
- Threaded Trees
- Balanced and AVL Trees
- NonBinary Trees
- Binary Search Trees
- Multi-Way Search Trees
- (2,4) Trees
- Red-Black Trees
- Splay Trees
- TreeMap Class
- TreeSet Class
Trees are a common structure in nature. It’s talking about a branching structure. Other similar structures might be a river, stream system for water flow. A ridge structure branches a lot as well. The lungs have a branching structure whereas the circulatory system is more of a graph structure. Graphs or Networks can be broken down into a tree structure. Of course, a file system is a well-known tree structure that everyone uses and knows. Document object models are tree structures, such as XML or HTML DOM. This is a structure that developers should learn and know well.
- Log files
- Hash Tables
- Ordered Dictionaries
- Lookup Tables
- Skip List
Dictionaries are more or less like Hash tables or Maps
- Shortest Paths
- negative weighted path
- positive weighted path
- acyclic path
- critical path
- Shortest Paths
- Weighted Graphs
- Directed Graphs
- Undirected Graphs
- Spanning Trees
As I said before I think Graphs data structure was misnamed. Net, Network, Mesh, or Web might have been better. Common graph structures in nature are spider webs, highway systems, electrical grids, and the internet itself. A cave system might be mapped out like a graph if it is a complex one. A graph is basically like a tree except that branches can connect to other branches. So in traversing a graph structure, you could end up going in circles. With graph structures, you get into Paths and path finding, for example finding shortest paths. I can see great uses for this in AI and game coding.
Searching and Sorting
- linear searching
- sequential search
- binary searching
- interpolation search
- tree search
Java has some built-in implementations for searching, Look at the Arrays class. Its easy enough to code a linear search with for loop or an iterator.
- Bubble Sort
- Selection Sort
- Insertion Sort
- Shell Sort
- Merge Sort
- Tree Selection Sort (Tournament Sort)
- Heap Sort
- Quick Sort
- NonRecursive QuickSort
- Bucket Sort
- Radix Sort
Why so many sort algorithms? Some are more efficient than others in given situations. Each one has a different best case, worst case and average case execution time. Java has some built-in implementations of the quick sort. Look at the Arrays class. Bubble sort is an easy one to code and is quick if the list is already in a mostly sorted state. You must iterate the list as many times as there are elements and simply swap higher with lower ones as you go. Lower values rise and high values sink. Or vice versa depending on how you set up the comparison. A very efficient sorting algorithm would analyze the data somehow and then pick the best one of the above to use for the given situation
- indirect recursion
Recursion is where a function calls itself. You would think this would end up in an infinite loop, but if set up properly it will not. Recursion is used heavily in sorting and traversing and working with trees. Though you don’t have to use recursion, you can code just about anything with or without recursion using loops and conditions instead. Indirect recursion is where function A calls function B and function B calls function A.