Depending on what type of bussiness you are active in, you either often start new projects from scratch, or not at all. In the first case you are more involved with proto-typing and experimental work whereas in the latter you are working on a product that needs to be supported, upgraded and maintained.
Starting code from scratch has a lot of benifits mostly caused by the freedom that comes with it. You are not bound to any choices made in the past, you are free to choose your own way of working that you are used to and feel comfortable with. There is also no legacy code that you need to understand and may be badly documented.
But what I want to address in this blog post is all the side-setup that is required when setting up a new project, and how important it is. Should it all be done from the start? Or can you delay it until it really becomes important?
As CPUs have more and more cores available, big performance gains can be achieved by having your application take advantage of running tasks in parallel. An increased level of parallelism however introduces more complexity. An intinsic problem when trying to run things in parallel is deciding what to run in parallel. The amount of tasks than can run completely in parallel are seldom and some type of synchronisation will be required most of the time, already decreasing the possible gain you can make.
Another problem you face is to determine how much you want to run in parallel. Do you want to start a thread for every single small task you can think of? Or do you want to have a couple of bigger threads? Since you know that creating threads comes with a cost, but can you avoid this cost by using a thread pool instead? There are so many questions to be answered when starting to work with parallelism, but today I will be focussed on determining how much threads you should have running given a certain capacity of the CPU.
Except from some small (personal) projects or tools, is software development a team activity. The bigger the application the more people are involed, and the longer the process is. Starting with a functional analysis all the way to operations, you have many different type of people all with their own function. However, the more the functions are split up, the harder it is to keep ownership and take responsibility and more often the blame game is played. This is simply caused by the fact that it also becomes easier to make mistakes because information is passed on multiple times, and different people will interpret things differently. But even just in development, depending on how you organise the process you can easily cause people to live on their own island.
When writing software you will end up using frameworks and libraries to make your life easier. Even if you don’t use any frameworks or libraries you still have the language itself that can evolve over time. A nice example of this is the recent release of Java 9 which did introduce some changes that might break your software.
All of these dependencies could become a hell if you are too tight coupled to them, as they restrict you from upgrading or changing when required. It is essential that you take this into account when writing software but as always it is easier said than done.
We all know that with a hashmap, the performance depends on the fill ratio of the actual map. A hashmap that is too full would result in bad performance due to hash clashes and the whole lookup would result in a lineair lookup.
However, it is important to note that there are different ways to implement a hashmap. One consists of having a single map and if the location where to store the element (depending on the hash value) is already occupied, you select the next available one. Another keeps a list of all elements that are present on a single location. This means that even if elements have the same hash value, there is no expensive procedure of finding the next free location and the same holds for the lookup.
But how exactly does the load factor influence the performance of the hashmap with the Java implementation? In the previous blog post I already did examine how the load capacity comes into play, this time we will extend it to also use the load capacity.
Related to the previous blog post, where I examined the effect of specifying the initial capacity of an ArrayList, I will now do the same for a HashMap. Compared to an ArrayList a HashMaps works different and it takes an additional parameter, the load factor. This parameter is required as a HashMaps performance will depend on how full it is. Because of the difference in adding elements to a HashMap and retrieving them again I decided to split this up into two separate blog posts. In this part I will examine adding elements to a HashMap.
When creating an ArrayList you have the option to specify the initial capacity. Most of the time however you don’t really know how big the list should be. If you would know the exact size of the list then it is often better to go for a regular array as this is much more memory efficient. For this reason I often just don’t specify an initial capacity, even though I know that this will lead to costly copying of the array when the list becomes too small. But what do you have to do when you don’t know the size? I guess you can always estimate it, and make some kind of guess. To get a better understanding of how the initial capacity affects throughput, I did some tests with it.