CPU usage
Q1. What are the best practices for writing code that does not consume CPU but still provides excellent performance? The question is very general. What I'm looking for here is to list the different methods used for different environments? debugging tips other than process manager / task manager
EDIT: I'm not talking about IO related processes. I'm talking about the processor binding process. But here I don't want my process to keep clogging the CPU. If I have 4 core machines, and if I run four simple loops in a process, the CPU consumption goes up to 400% until the application / process is started.
I'm looking here for some experience on a topic that everyone would have encountered for a while. for example I was debugging when the application was running the CPU on Windows as it continuously looped to find a non-existent file.
How can I write my program so that two different CPU bound applications run smoothly (give a good answer)?
UPDATE: Suggestions:
-
Write nice clean code, then an application profile and optimize it. (Thanks for the hint)
-
It is easier to rewrite / reverse engineer / refactor the code than to profile and fix.
-
Use a profiler to debug your application
-
Don't use spin blocks for threads with long waits
-
Algorithm selection
These sentences provide a long way for beginners to understand the concepts.
a source to share
Write nice clean code first. Make things the easiest way. Then do the following until you are satisfied with the speed of the program:
- Its execution profile.
- Find the parts where he spends more time.
- Speed up these parts.
Do not fall into the trap of perverting your code from the front in the name of optimization.
Remember the Law of Amdhala . You won't get noticeable improvements by speeding up something that already consumes only 1% of your program time. You get the best kick for your optimization by speeding up the part that your program spends most of its time.
a source to share
- Use the profiler religiously. Do not rely on common sense to find bottlenecks.
- Study the Big-O note, remember Big-O for general algorithms.
- Avoid busy wait loops at all costs.
- In case of injection, learn how to make the code cache-friendly, which can do up to 10x speedup sometimes on hard loops.
- When doing high-level multi-tier development, learn how to cache data efficiently (for example, to minimize the number of database statements).
a source to share
Do as little work as possible.
Since you edited the original question, I'll add a few more considerations to describe the specific situation you described.
Assuming you don't know where your process is blocking (since you were asking for debugging tips), you can start by pausing the debugger, this will stop the application whatever it is doing, and from there you can examine the current location of all threads and see if whether any of them are in a narrow cycle.
Second, any decent profiler can easily help you catch situations like this. Attach a profiler and run the app at a blocked point to watch for calls that significantly increase the percentage of your total execution time. From there, you can go back to find the blocking loop.
After you decide to rethink the algorithm, avoid the situation completely. If this is not possible, enter the commands of the stream into the stream. This will allow other threads to hit the CPU and increase the sensitivity of the application and the OS as a whole by increasing the execution time. The multi-core programming trick ensures that all your threads are compromised between performance and consideration of other pending tasks.
Without knowing the specific language or operating system of your targeting, I cannot recommend this combination, but a debugger / profiler, but I would assume there are good solutions for most mature languages.
a source to share
If its net CPU usage is yours after that, you need a big O notation. You need to decide how to get your algorithms to run with minimal computation.
However, when it comes to overall performance, I believe that CPU usage is one of the smaller bottlenecks.
Some more important things to watch out for with performance,
Data linking, you get all the data up front or just get it as needed. Choosing one of these methods can be key to the performance of your application.
Can you shrink the data you are working on? If you can get everything to fit easily into memory, you can get performance here. On the other hand, if you invest too much in memory, it can have the opposite effect.
I think to sum it up, there is no general performance solution. Write your code (with some intelligence) and then see where it fights.
a source to share
I'm talking about the processor binding process. But, here I don't want my process to keep hurting the CPU. If I have 4 cores, and if I run four simple loops inside a process, the CPU consumption increases to 400% until the application / process is started.
You probably want to look into throttling mechanisms to reduce the idle load on your CPU:
Under normal circumstances, your code will even consume CPU cycles when it doesn't need to do anything (ie, "waiting to wait").
For example, an empty infinite loop will run as fast as possible if it doesn't need to do anything.
However, in some cases you do not want the wait to be busy, or on some platforms you can avoid it altogether.
One established way to do this is to use idle sleep calls so that the system scheduler can reschedule all running threads. Likewise, you can use timers to determine the actual update rate of your function, and simply not call your code unless you need to run it (this is a mechanism sometimes used by games or simulations).
In general, you will want to avoid polling and instead use smart data structures (such as a job queue) that provide the ability to automatically tune your behavior at runtime, without having to constantly check the data structure.
a source to share
It is not entirely clear to me if you are looking for ways to make the most efficient use of the CPU, or ways to avoid tying up the machine when you need a lot of CPU resources.
They are incompatible.
For the former, you ideally want the OS to simply allow you to fully or partially intercept the processor (s) if you like, so that you don't have to waste CPU cycles on the OS itself, let alone any other processes that might be running.
For the latter, well, I've been writing some code lately that uses poorly designed processor-specific algorithms , and the new Intel I7 processor was my savior. Given four cores, each of which can run two threads, I'm just trying to limit the OS threads to five or six per application, and I still have the CPU available to switch to another window to run the command kill
. At least until I started the system in swap with space leaks.
a source to share
Good suggestions here. The easier you write code, the more time you will save.
I see performance tuning as a debugging extension . People say they measure, measure, measure, but I don't. I'll just let the program tell me what the problem is, if I run it without warning, multiple times if necessary. This is usually a surprise and he is never wrong.
The usual story of this, depending on how big the program is, finds and fixes a number of performance issues, each giving between 10% and 50% speedup (more if there is a bad problem). This gives an overall acceleration of perhaps 10 times.
Samples will then tell me exactly what it does, but I can't think of how to fix it without a basic redesign, realizing that if I did the design differently in the first place, it would be a lot quicker to get started.
Suppose I can do a redesign, then I can do a few more rounds of performance and search. After that, when I'm at the point of diminishing returns, I know it's about as fast as physically possible, and I can do one step-setting at the assembly level and follow each instruction by "pulling its weight" into getting a response.
There is real satisfaction in getting this point.
a source to share