Software Fixes For Hardware Problems

Dealing with a physical system is pretty challenging as you have to take into account the hardware limitations. There is a big misconception that software is only limited by your imagination, and that whatever you think of you can just write it. But when you are dealing with software that completely deals with digital data and has no presence in the physical world, you do have a lot more freedom compared to when you have to take the physical world into account.

Besides the limitations you also have to take into account noise, inaccurate values and even hardware failures. You will often need to write your software such that it can deal with all of these problems, even though these are not caused by the software itself, so we already see a certain level of ‘fixing’ hardware problems. Because if the hardware would be accurate and immune to failure, your software could just rely on the values it gets. But since this level of correctness is impossible in the real world, you have to fix it in the ‘virtual’ world.


This however enforces the view many people have, that you can just deal with everything in software, and as mentioned earlier, that it is not limited except by your imagination. This could have an opposite effect, instead of investing money in better hardware, it is requested that your software just handles it. And it might even lead to deteriorating hardware which always has to be compensated by the software.

But the software can under no circumstances fully account for mistakes of hardware. It always involves an overhead, which can be very costly. For example: to erase noise from a bad scanner, you need to average the values over a certain time period. This means it takes longer to get a value. An example I faced recently was the slipping of an AGV (Automated Guided Vehicle). Such an AGV only has a single drive wheel, and if that wheel slips we don’t know the speed of the AGV (as the speed is measured by counting the rotations of the drive wheel). As a result the AGV loses track of where it is because the positioning systems uses a combination of reflectors and drive feedback. It was now requested to fix this position lost situation by letting the positioning system find the correct location.

While this is possible, it is dangerous as the positioning can easily find the wrong location in situations where there is a high level of repetition in the reflectors (which often happens in a long hallway). The problem here is clearly that our drive wheel slips, but instead of focussing on how to fix this, this is completely bypassed and a software solution is demanded.

The main reason for this kind of thinking: making hardware changes is considered hard and expensive, software changes however can be done fast and costs almost nothing. Clearly this is the way how people with no understanding of software think, as professional Software Engineers will tell you that this does not always have to be the case. A small change that is done quickly can have huge implications in the long term due to required refactoring etc. The real solution here would be to have a speed sensor on a different wheel than the drive wheel.


Building A Hadoop Cluster With Docker – Part 4

With the HDFS cluster setup, we still need a way to actually use this data. Although just using the cluster as a way for duplication and backup of data is a viable option, this is not really what it is meant for. So we should start setting up everything that is required to run mapreduce jobs on the cluster.



The first thing we need to setup is the Resource Manager. The Resource Manager is similar as the NameNode in that sense that it performs a critical function and should run in a separate container.

The setup of the Resource Manager is relatively easy. But compared to the HDFS nodes, we can not start from an empty config, as there are too many essential config options in the ‘capacity-scheduler.xml’. If this configuration file is missing, you can not start the resource manager, so it does not even consider default configuration values. The changes done to the configuration files are limited to overwriting the default user.

By exposing the Web UI port (8088 by default), you can already see the Web UI. The Web UI looks completely different from the DataNode or NameNode.

The next step is to setup a NodeManager, and because of the way Hadoop works, this should be running in the same container as the DataNode. This poses a small problem, as a Docker container can only have a single start command. I guess there will be ways around this, like starting both applications from the same script. But it would also be nice if the container would quit as soon as either the DataNode or NodeManager exit.

Hadoop does have mechanisms to start all processes as a daemon, but this can not be used for both of the processes as the container will consider both processes as done and just stop. Running one as a daemon process and the other directly would give up the possibility to quit the container when the daemon process exits.

As a starter, lets just create a NodeManager in a separate container and let it connect to the ResourceManager. This should allow us to fix all problems such as hidden ports. And as expected the NodeManager can not connect to the Resource Manager as it needs the following ports: 8030, 8031, 8032, 8033.

One of the things you can see from the Web UI is that there are no log files on the Resource Manager. This is pretty familiar as we had a similar problem with the NameNode, but in contrary to the NameNode, the problem here is just that the folder does not exist and no files are written to it. This of course makes you wonder why YARN does not automatically log stuff to file? Is it because some loggers have the wrong settings?



With the Resource Manager setup, and a Node Manager connected to it, we can start launching jobs. But how do you do this? From a quick search it looks like you have a couple of different options.

  • Command line: by using the ‘mapred job’ command.
  • The REST API
  • From code

To launch a job from the command line, you need some job file. As I could not find out what this format needs to be, as just using a jar file fails with a SAX parsing exception. It is not documented anywhere. The REST API can only be used to query the status of jobs and not to create a new job, or at least so it seems. So the only option left for me te test is from code.


I once did launch a local job with local data, so I will have to change it a bit. Adding some configuration allowed me to connect the job to the HDFS cluster, but running the job still failed because the user ‘chris’ was not allowed to write to the output folder.  There are two possible solutions: change the user to ‘root’ or change the ownership of the folders, since every user writes to his own directory anyway.

With this done, I was able to run a local job but using data from the HDFS cluster. The configuration I had to add in my code for this is as follows:

conf.set(“fs.defaultFS”, “hdfs://localhost:9000”);
conf.set(“yarn.resourcemanager.hostname”, “localhost”);
conf.setBoolean(“dfs.client.use.datanode.hostname”, true);

But how do you run your job on the cluster as well? I have had a quick look into it, and it looks like you have to create a Yarn client application and do a lot more setup and just launching a quick job. There is not that much documentation about this, which makes it much harder than it should be.


Due to the amount of work and complexity of setting it all up, I have decided to leave my Hadoop cluster setup like this for now. I was able to do some very basic setup and trying to do more would involve a lot more effort and investigation to setup everything up.

The open points at the moment are:

  • Running the NodeManager and DataNode in the same container.
  • Launching Jobs on the cluster.
  • How does a NodeManager know which data is stored locally?

I do not intent on investigating these topics anytime soon, as I will focus my attention on other topics first. So this concludes my first experience with Hadoop.


Building A Hadoop Cluster With Docker – Part 3

This part continues where we left of with part 2, we will examine the other ways on how to upload files to a Hadoop cluster:
  1. Using the command line.
  2. From Java using the DFSClient
  3. From Java using the FileSystem

If you did not read the previous part, I highly recommend you to do so before continuing. Before diving into the new work, lets summarise what we learned from the previous part.

  • We had to fix the hostname of the container.
  • The mapping of ports caused a problem.
  • The filesystem permissions are very restrictive to non-root users.

The first two points means that the cluster can not be used as it was designed in part 1, and is restricted to a single datanode.

Continue reading

Building A Hadoop Cluster With Docker – Part 2

After setting up a HDFS cluster in part 1, it is now time to put some data in the cluster. There exists multiple ways to upload files to a Hadoop cluster:

  1. Using the Web UI of the namenode
  2. Using the REST API.
  3. Using the command line.
  4. From Java using the DFSClient
  5. From Java using the FileSystem

Treating them all in a single post would make the post too long (in my opinion). Therefor I have chosen to split it up into two separate posts, meaning that in this blog post I will only discuss the Web UI and REST API (which should be nearly identical). The other approaches will be handled in the next.

Even though you may not be interested in using the Web UI or the REST API to upload files to you Hadoop cluster, I recommend reading through this post as problems encountered here, will come back in the next one. Even more important though are the fixes and changes that are applied to get it working as they will have an impact on the other ways of uploading files as well.

Continue reading

Building A Hadoop Cluster With Docker – Part 1

One of the things I started in 2017 and wanted to get more knowledge about in 2018 was Hadoop. While it is possible and very easy to run Hadoop on a single machine, this is not what it is designed for. Using it in such a way would even make your program run slower because of the overhead that is involved. So since Hadoop is designed for big data, and distributes that data over a cluster of machines it makes sense to also run it like this. While it is possible to have many nodes running on the same machine, you would need to setup quite a lot of configuration to prevent port collisions and such.  On the other hand I don’t have a cluster available to do some experimenting with. Although I know it is possible to get some free tier machines at cloud providers such as Amazon, this is not something I want to do for a minor experiment like this. Instead I have chosen to create a virtual cluster by using Docker.


Not only does docker allow me to create a virtual cluster of nodes that have identical configuration, it also creates a more real-life setup. All the different clusters will have their own file system completely separate from the rest, you can selectively inspect and bring down a node to see how the cluster reacts. By choosing Docker I also hit two birds with one stone as it allows me to:

  • Freshen up my docker knowledge
  • Get a taste of Hadoop

I am in many aspects a minimalist. I always wonder how heavy my application is both with regards to memory and CPU usage. I do the same with my docker container. I always want to keep the size of the container as small as possible, but with Java this is not very easy.

So I started my journey with Alpine, on which I installed bash and Java, which is a hard requirement. Java could however not run due to some weird issues. I guess (and have read about it) that they are caused by some compiler/library which is different. But even using the version that was suggested did not fix the problem.

It didn’t take long before I gave up on this minimalistic endeavour as it is not the core of this experiment. I may look into it again at a later point, when I have my docker cluster running. But for now I decided to use the official Java 8 container.

My first experiment consisted of having Hadoop installed in a container and let it run a job. As a first test I used the included example (WordCount), which did not run immediately due to the class not being found. In contrary to what the example tells, I had to specify the fully qualified name of the class. This is a quick fix but an important one and something we should keep in the back of our mind.

Before doing the actual setup of a cluster I wanted to work out some small use case of my own. I decided on a very simple one, which consists of going through a log file and count how often we received a message from each client. Although in general it was easy to get this working, I did encounter my first problem where my output directory remained empty. It was not that easy to figure out because the log file didn’t immediately show the problem, but I was able to discover that it was caused by a NullPointerException as there was a line that did not match the expected format. In the end it was pretty easy to get my use case working.

For the actual cluster, I based myself on work from others, which made it very easy. To do this, I had to start using docker compose as I have to start a bunch of containers that are linked together instead of single containers. I also glanced into docker swarm, but this would only be required if I really had multiple devices. I created separate docker images for the datanode and namenode to allow for different configuration, they are of course based on the same general hadoop image. It was pretty easy to get my namnode up and running and have a bunch of datanodes connect to it.

But as I was going through the web interfaces I noticed that the log files were not available. Going back to the Hadoop setup page I noticed that they used a different script to setup the node as a daemon. This is not useable with docker as the script will spawn a child process and the original process itself will terminate, meaning the docker container will exit. Instead I had to keep using the direct ‘hdfs namenode’ approach. But I did notice that running the daemon script did have the logs file, where mine did not. After some examination of what this script does, and how configuration is used I discovered all I had to do was set an environment variable to write to file instead of the console.

So now I have a Hadoop cluster running consisting of a namenode and some datanodes. I still haven’t done anything with them, and that will be the topic of my next blog post. If you are interested in my docker cluster, you can find all the of the files on my GitHub account.

Float vs Double

Recently I took a CUDA course and one of the things they mentioned to keep an out for was the usage of double precision. Double precision operations are slower and the added precision wasn’t worth it, so they say. This made me wonder whether this is also true for regular programming with languages such as C++ and Java. Especially because both of these languages have a double precision as the default floating point number. So how bad is it to use doubles instead of regular float, and what about that precision. I have investigated this in Java.

Continue reading

Wrapping Up 2017

As it has become a habit, the first post of the new year is one where I look back at what 2017 has meant for me and this blog. I also want to look forward of what I expect and hope to do in 2018.


Personally not much has changed, I am still working at the same company where I have contributed to a full re-write of some software. Most of the software was taken from an existing system and we had little control over, which means our contribution was pretty limited and most of it was done by a single guy. As a result, the new software is written poorly in my opinion, and while I have expressed my concern about this many times, nobody has taken any notice of this and no action is taken. Most of the existing software is not any better and a re-write or at least very invasive refactoring is required, but due to a lack of resources and because the importance of this is neglected by management no time is reserved to do this.

The company has tried to evolve, but failed in doing so and still struggles to get everything running smoothly. In the attempt to improve it actually created more confusion and uncertainty. Employees have been shifted around the organisation, much to the dissatisfaction of those employees who now all of  sudden have to do something completely different. Some of them already left, others are about to leave and more are thinking of leaving. It is clear that the company does not think about its employees, but even worse it does not think about its own future in the long term.

In the past year I have been learning a couple of new technologies, such as CUDA and Angular. I did start experimenting with Hadoop, but this is still in the early fase and I have not yet learned that much beyond the basic concepts. I do look forward to getting more hands-on with it.

My blog has been doing okay, even though the visitor count is a bit below that of last year. The beginning for the year started of weak, where the end has been going very strong. I would like to be able to say that it was because of my increased post frequency at the end or because of my posts about Angular. But this isn’t really true as my posts about Angular didn’t get any more viewers than my other posts.

In the next year I will of course keep on going with my blog post. The reason for the blog has not changed, and I still enjoy writing posts about topics. It triggers me to investigate and try something new. It causes me to reflect and investigate something I would normally not have done. I do hope to so a bit more activity and would like to see my minimum monthly visitors rise to 10.

Something I also want in the next year is a new challenge. It is still unclear what exactly that should be, but I want to get rid of the boredom I feel at work at the moment. I definitely want to keep learning new technologies and hot on my list are: Hadoop, Data Mining, Artificial Intelligence and BlockChain. While I won’t be able to do something useful with all of them, just getting some started experience would be great.