When creating an ArrayList you have the option to specify the initial capacity. Most of the time however you don’t really know how big the list should be. If you would know the exact size of the list then it is often better to go for a regular array as this is much more memory efficient. For this reason I often just don’t specify an initial capacity, even though I know that this will lead to costly copying of the array when the list becomes too small. But what do you have to do when you don’t know the size? I guess you can always estimate it, and make some kind of guess. To get a better understanding of how the initial capacity affects throughput, I did some tests with it.
Git is a versatile revisioning tool, allowing many different kind of manipulations. This versatility however also leads to the fact that you can achieve the same goal in different ways. An example of this is my previous blog post: Branch/Merge Strategies. In this blog post I will discuss something similar, yet different… the usage of the master branch.
Coordinate systems is maybe not something you are faced with every day, and might lead you more to the gaming industry, but nevertheless are coordinate systems something you will face sooner than you think. The pixels of your screen for instance already define a coordinate system. This system has the zero point in the left top corner which is similar to how we read in standard languages such as English. We start at the top and go to the bottom, and on each line we read from left to right.
The coordinate system of your screen is a bit special, as it does not allow for negative values and works only with natural numbers. Because of this implicit coordinate system something as simple as showing something on the screen already requires a conversion or mapping. In this blog post I will go into more detail as to what problems can arise and what you should take into account.
Recently we had the problem that some application that was running at the customer was not able to talk to multiple instances of our application. There were a lot of reasons why we wanted a separate instance of our application, even though we could have combined them into one. The main reasons was the actual physical differences between the things they had to manage. But changing the software of the customer would be too costly, so instead we investigated whether it would be possible to introduce a kind of router to handle this.
As more and more things are being automated, our daily life becomes easier. In many cases however we have a mix of automation and manual actions, or we want to be able to intervene or correct when automation goes wrong. Whatever reason, a fully automated system is often still not feasible, and thus manual interactions needs to be allowed and taken into account during design and implementation. But what exactly are the implications of this?
Dealing with a physical system is pretty challenging as you have to take into account the hardware limitations. There is a big misconception that software is only limited by your imagination, and that whatever you think of you can just write it. But when you are dealing with software that completely deals with digital data and has no presence in the physical world, you do have a lot more freedom compared to when you have to take the physical world into account.
Besides the limitations you also have to take into account noise, inaccurate values and even hardware failures. You will often need to write your software such that it can deal with all of these problems, even though these are not caused by the software itself, so we already see a certain level of ‘fixing’ hardware problems. Because if the hardware would be accurate and immune to failure, your software could just rely on the values it gets. But since this level of correctness is impossible in the real world, you have to fix it in the ‘virtual’ world.
With the HDFS cluster setup, we still need a way to actually use this data. Although just using the cluster as a way for duplication and backup of data is a viable option, this is not really what it is meant for. So we should start setting up everything that is required to run mapreduce jobs on the cluster.