View Single Post
Old 2nd October 2018, 23:59   #52  |  Link
zorr
Registered User
 
Join Date: Mar 2018
Posts: 447
Optimizer arguments

It's time to take a closer look at how to adjust the optimization process. Let's run the optimizer using the same script and settings used in the last tutorial:

Code:
optimizer <path_to_your_script> -iters 100
The program displays:

Code:
Arguments
  iters = 100

Running optimization for script d:/optimizer/test/flower/denoise.avs
Using these settings:
ARGUMENT      DESCRIPTION             VALUE
-runs         runs                    5
-alg          algorithm               spea2
-pop          population              8
-iters        iterations              100
-mutamount    mutation amount         0.3 0.01
-mutcount     mutation count          60% 1
-crossprob    crossover probability   0.1
-crossdist    crossover distribution  20
-sensitivity  sensitivity estimation  true
-dynphases    dynamic phases          N/A
-dyniters     iterations per phase    N/A
You can stop the optimization after this text is displayed.

The "Arguments" section lists the arguments and their values as they were understood by the optimizer.

The next section is a handy cheat sheet on what arguments are available and their current values. The first column ARGUMENT tells the argument name you can use to specify the setting. The DESCRIPTION column contains a short description of what the argument does. And finally the VALUE is the current value used for the argument. Most of these are using the default values, we only specified the -iters argument. If you run the optimizer in another mode (like "evaluate") the listed arguments are specific to that mode.

I spent quite a while figuring out good default values so they should work reasonably well, but I have only tested them on a few different optimization tasks so they might not be good for every case. It takes a lot of effort to test these settings because to determine if one value is better than another one should run the optimization task many times with each value in order to gain enough statistical significance. I mostly used 20 runs per parameter value.

Let's take a look at the arguments one by one.

-runs specifies the number of optimization runs. a "run" is one complete optimization cycle which itself is specified with the -iters argument. I talked about the need for multiple runs earlier but I will repeat the points here: Since the optimization process is depending on random numbers the outcome is not always the same and there can be large differences in the final result. If you only run the optimization once you cannot be really sure whether the results are good or bad. Another useful aspect of multiple runs is that the variance of the best result can tell us about how easy or hard this optimization task is. Large variance means difficult task. And if the task is difficult we can try to increase the iterations. I don't have a good answer on how many runs are enough. If you can only run N iterations should you run for example three runs with N/3 iterations or eight runs with N/8 iterations? More iterations is better but more runs is also better, to a point.

-alg specifies the metaheuristic algorithm used in the optimization. Currently there are three options: "nsga-ii", "spea2", "mutation" and "exhaustive". NSGA-II and SPEA2 are very good and well known algorithms. I got slightly better results with spea2 so it's the default. If you're interested in how these algorithms work you should check out the free ebook Essentials of Metaheuristics. The third option "mutation" is a very simple algorithm I wrote which only uses mutation. It can find a reasonably good result faster than the other algorithms but it will lose with large iteration counts. Finally we have the "exhaustive" option, it simply tries all the possible (and valid) parameter combinations. It can be useful if you only have a few parameters and can limit the number of values per parameter so that the number of combinations doesn't get too high. I have tried some other metaheuristic algorithms like CMA-ES, BFGS (Broyden–Fletcher–Goldfarb–Shanno) and SMPSO (particle swarm algorithm) but I didn't get as good results with them. The SMPSO is still waiting for a more thorough examination, it is promising. I should also note that the algorithms I'm using are not the basic variations, I have changed the way the mutations work and got better results that way.

-pop specifies the population size which is a term often used with genetic algorithms. It's basically how many individual results are kept in memory during the optimization. The genetic algorithms (like NSGA-II and SPEA2) work by doing crossovers between two individuals and then mutating (randomizing) the results slightly. The crossover operation takes some values from one individual and some from the other. The new individuals are rated and finally the best ones are selected as the new "generation". The default population size of 8 seems very small and maybe you're wondering why it should be small at all, after all it's not a problem to keep thousands or even millions of results in memory. Yes, in theory you should get better results with a larger population size but it does have a drawback: it makes the progress slower. If the population size is much larger than the size of the pareto front that means many less than optimal results are kept around and are used in the crossovers. Combining two bad results might create a very good individual but it's more likely to happen when combining two good results. But if you are going to run with a large iteration count then perhaps increasing the population size will also help. A larger population may also be needed with a difficult optimization task. If you want a reasonably good result fast use the "mutation" algorithm with a population size of 1.

-iters specifies the number of iterations. One iteration means one execution of the script we're trying to optimize. You can give the iteration count as a number (for example 1000) but there are other indirect ways. You can give a time limit in days, hours and minutes. For example 5h30m would run 5 hours and 30 minutes. 1d12h would be one day and 12 hours. You can use spaces if you put quotes around the value, for example "2h 45m". Using the time limit can be useful if you have a specific deadline for the results, or if you want to try what the optimizer can find during the night while you sleep. Just remember that the time limit applies to a single run, so if you start an optimization with 3 runs and 1h iterations it will take a total of 3 hours. During the optimization the maximum iteration count is still displayed on each result line but it is only an estimation.

Code:
  20 / 345 : 4.772059 20ms sigma=349 blockTemporal=1 blockSize=50 overlap=15
  21 / 347 : 4.815294 20ms sigma=477 blockTemporal=2 blockSize=64 overlap=15
  22 / 349 : 4.875304 20ms sigma=591 blockTemporal=2 blockSize=30 overlap=6
  23 / 350 : 4.880693 110ms sigma=800 blockTemporal=5 blockSize=61 overlap=7
  24 / 343 : 4.909643 150ms sigma=800 blockTemporal=5 blockSize=61 overlap=21
There also also two keywords that trigger a special dynamic iteration mode: "dyn" and "dynbk". dyn stands for "dynamic" and means the iteration count depends on how the optimization is progressing. There are two additional arguments that define how the dynamic iteration is behaving: -dynphases and -dyniters. -dynphases defines how many distinct "phases" the algorithm is using (default is 10 but it's displayed as "N/A" in the example since it does not apply to the chosen iteration method). The phase goes from 0.0 to 1.0 during the iteration and affects how large and common mutations are. In the beginning (phase 0.0) the mutations are large and applied to many parameters. In the end (phase 1.0) they are small and applied to few parameters (in general, you can also change that). If dynphases has a value of 10 the phase range is divided into 10 steps and the phases used are thus 0.0, 0.1, 0.2, ..., 0.9 and 1.0. The dynamic iteration stays at the current phase step as long as it's still making progress. The "not making progress" is triggered when there has not been new pareto front results in the last -dyniters iterations. In that case the algorithm moves to the next phase step and resets the counter. The default value of -dyniters is also 10. "dynbk" is much like "dyn" but it can also move backwards (hence the name, "dynamic backtracking") to the previous phase step. That happens whenever a new pareto front result is found. I have not done a comprehensive study on which of these algorithms gives better results. The dynamic iteration counts are useful when you are not limited by a specific deadline and just want to find the best result. When you're running a dynamic iteration the phase step changes are displayed in the console, for example:

Code:
6 iterations remaining is this generation
No improvement in 10 iterations - moving to phase 1/10
Also instead of displaying the maximum iteration number the current phase is displayed (there's no reliable way to estimate maximum iteration count):

Code:
  43 / 0,10 : 4.470519 20ms sigma=474 blockTemporal=-1 blockSize=22 overlap=0
  44 / 0,10 : 4.90395 60ms sigma=665 blockTemporal=3 blockSize=41 overlap=10
  45 / 0,10 : 4.757739 10ms sigma=490 blockTemporal=3 blockSize=32 overlap=0
-mutamount specifies the mutation amount. The amount is proportional to the allowed value range given in the script for each parameter you're optimizing. For example if you have a parameter with a value range 0..100 that would mean the parameter has 101 valid values. This number is multiplied by the mutation amount to get the largest possible change the mutation is allowed to make. So with mutation amount 0.2 the mutations would vary from -20.2 to 20.2. You can give two values for the mutation amount, the first is used in the beginning (phase 0.0) and the last in the end (phase 1.0) and linearly interpolated in between. If you only give one value that is used in all phases.

-mutcount specifies the mutation count. Whenever mutation is applied the first step is deciding how many parameters will be mutated and this argument defines just that. Like with -mutamount you can give a different value for the beginning and end phases. What's more the count can be given as a percentage of the number of optimized parameters in the script. So if you have 20 parameters to optimize and specify -mutcount 50% the algorithm will mutate 10 parameters. You can mix both presentations, for example the default -mutcount is "60% 1" which means mutating 60% of the parameters in the beginning and one in the end.

-crossprob specifies the probability of the crossover operation. If the probability is 1.0 the operation is applied to every new individual, if it's 0.0 it is never applied. In my tests I have found that this crossover argument is not that critical for a successful optimization.

-crossdist specifies the "distribution index" of the "simulated binary crossover" which is the crossover method used in NSGA-II and SPEA2. To be honest I don't fully understand what it does. I haven't investigated what value would be optimal for this argument.

-sensitivity specifies whether the sensitivity estimation algorithm is used. This algorithm is trying to determine how "sensitive" each parameter is, that is how much changing the parameter's value will affect the result. The sensitivity is then used by scaling the applied mutation amounts. The results are usually better when sensitivity estimation is on. If you want to switch it off set the value as "false".

Now you know how to change the optimization process. The default values are good most of the time but feel free to try different things. The most important arguments are probably -iters (and -dynphases and -dyniters if dynamic iteration is used), -runs, and -pop, followed by -mutamount and -mutcount. If you find good settings for a specific script please let me know.

In the next episode we will focus on the visualization of the results.

Last edited by zorr; 23rd November 2018 at 00:50. Reason: Added description of "exhaustive" algorithm.
zorr is offline   Reply With Quote