Performance optimization
Performance Optimization is no sorcery. Performance optimization can be applied to programs, operational systems, computer (components) and networks. In all these cases, a systematic deductive approach is used - in other words doing detective work. One goal doing a performance optimization is to find the bottleneck that hinders the system performance. But the task quite more challenging is to fix the bottleneck.
Performance analysis is comprehensive. At first the overall System has to be comprehended. Then the analysis of each part/component the system consists of will be analyzed.
The components of software are
- Program parts
- Algorithms
- Data structures
For computer systems, these are the system components and for networks the
- Network components (Switches, Router)
- Servers and their components: network cards, RAM
- Software components like OS, Firewalls, Application servers, databases, remote file systems, etc.
The term analysis stands hereby for understanding the architecture, the configuration, the operation modes and the function. Often a simple analysis of log files can give valuable hints for bottlenecks. If there is a configuration error or a component fails the requirements performance problem may be fixed quick and easily.
Performance measurement is individual. Following the analysis tightly focused measurements were performed at the components, which (after our experience) have the highest probability to cause the performance problem. For some components (especially for software) there are no off the shelf measurement tools. For the lots of combinations of interactions of components, there cannot be such a tool. One of the main subjects in performance measuring is to create appropriate measurement tools to allow for quantitative analysis of the performance. Most performance problems depend strongly on the nature and quantity of input data. Therefore, another core subject is the preparation of well-defined input data. Measuring the performance of a system under controlled elevated system utilization results in a utilization performance response diagram which may lead to hints for the component that causes a performance problem.
Performance measurements of software. For the measurement of software performance often runtime or memory profilers can be used. These valuable tools used on the complete program usually return ambiguous or no result at all since the program crashes during the profiling. To support the profilers, it is crucial to partition the code into segments which will then be profiled independently with defined input data for each segment. If the code is already modularity programmed and equipped with well-defined interfaces between the modules this is a relatively easy task. But a monolithic code has to be segmented into modules firsthand before a performance measurement can be applied reasonably. Profilers oftenly are blind for the bottleneck. If e.g. an interpreter uses lots of memory by utilizing an external C library this memory consumption is out of the scope of the interpreter and, therefore, invisible for the profiler. Do deal with that it is in most cases necessary to measure the memory consumption of the program with addition measurement code that is implemented in the program like a probe that logs the memory usage at certain measurement points in the program. Depending on code quality, this probing can be easy or a mission nearly impossible. Even if the profilers or the probes can identify a program module as the bottleneck it is oftenly not obvious what the cause for the bottleneck is. Is it the algorithm, the data structure, a programming error or probably the hardware?
Performance measurements of hardware. Most OSes come with lots of probes which can be used to measure the performance of the underlying hardware. But oftenly it is necessary to write software to read out these probes or to control the induced stress of the hardware precisely. Same holds for network measurements. Network components are capable to export it probes using the SNMP protocol to produce performance logs. While measuring network performance it is crucial to feed in a controlled amount of network traffic to test the components in defined situations. This is impossible if the network in under production. Also, the time synchronization of the logs of different network components cannot done with of the shelf software and self-made programs are the only solution. The identification of a badly performing server or network component does not give the cause for the weak performance. In the scarce cases, the component is broken or does not comply to the requirements and the problem can be solved by replacing the component. Often the interaction of a couple of components is the problem and often is one component software or OS. Now it can become difficult to decide if the problem is hard or software related.
Performance optimization as an impact. Performance problems are usually only the tip of the iceberg. If the hardware fails it is often not the hardware but the software which simply overloads the hardware. Such overload often is caused by bad or careless programming. Such programming uses too many resources, and/or forgets free no longer used resources. Not many programmers think about a dynamic memory management and I/O-pooling if the write their first version of the code. After a while, the performance goes down and finally is unbearable. Now these cross-sectional features have to be implemented afterward in the production code. Depending on code quality, this can be easy or so expensive that it is easier to rewrite the whole code. The same holds for hardware systems. If the hardware problems are not to be cured by better software you have to buy more and expensive hardware. Some problems can be parallelized and can be distributed over a bunch of computers. But every scaling of performance utilizing a cluster of computers has an additional price to add on cost of the computers: The elevated complexity is one cost and the second is the management of an addition very fast network in the cluster for fast data exchange between the cluster nodes.
Performance optimization works. Starting with realistic expectations and with the will to solve the underlying problems and not only the symptoms a performance optimization will succeed and usually there is a dramatic mostly factorial performance gain. We have had cases where the whole software had to be rewritten. But afterward the customer was very happy with the new solution. The customer had learned that clinging to the former code would have doomed the whole project.