Basic Optimization Methods

TECHNISCHE UNIVERSITÄT ILMENAU Integrated Hard and Software Systems http://wwwihs.theoinf.tuilmenau.de Basic Optimization Methods Problem Statement Single Pass Approaches Heuristic Search Framework General Framework List scheduling HillClimbing Clustering Random Search BranchandBound Simulated Annealing Genetic Algorithms Tabu Search

Problem Statement Design and Implementation Issues: Mapping of modules, functions, operations, etc. on hardware entities HW/SW partitioning Scheduling of the eecution of operations Eample: HW/SW Partitioning SW HW

Enumeration and Branch and Bound Complete enumeration checks all possible solutions for its quality This is a brut force approach! BranchandBound subsequent (stepwise) construction of solutions put partial solutions on hold (bound branches) that do not seem interesting at the moment these partial solutions may be revisited (epanded) later on BranchandBoundwithUnderestimates use bestcase estimates (underestimates) to bound (and eclude) solutions during the search > see later for details

A Simple Eample...... that shows the eponential nature of the problem A simple sequencing problem: Find the best order to traverse a given set of n cities according to some path optimization criteria (or find the best order to eecute a set of tasks on a computer): number of cities n number of possible sequences 6 68 8 n! Most problems we deal with are NP hard! In practice, these means that the time to compute the best solution increases eponentially with the size of the problem.

The Compleity Problem Time compleity function Size n (linear scale) 6 n. ms. ms. ms. ms. ms.6 ms n. ms. ms.9 ms.6 ms. ms.6 ms n ms 8 ms 7 ms 6 ms ms 6 ms n ms. s. s.7 min. min min n ms s 7.9 min.7 days.7 years 66 centuries n 9 ms 8 min 6. years 8 centuries 8 centuries. centuries Complete enumeration of all alternative solutions is out of question even with the most modern computers Heuristics are needed that come up with good solutions that are not neccessarily optimal

Heuristic Search Most heuristics are based on an iterative search comprising the following elements: selection of an initial (intermediate) solution (e.g. a sequence) evaluation of the quality of the intermediate solution check of termination criteria select initial solution select net solution (based on previous solution) search strategy evaluate quality acceptance criteria satisfied y accept solution as best solution so far n termination criteria satisfied y 6

HillClimbing Idea: search neighborhood for improvements select best neighbor and continue select initial solution select all neighbors (based on previous solution) evaluate quality of neighbors y neighbor with better quality eists n y accept best neighbor as intermediate solution 7

HillClimbing Application to HW/SW Partitioning 8. Initial Solution. Candidates for Climb. Select best improvement SW HW SW HW SW HW Legend: HW SW indicates some module, function, operation etc. indicates some kind of communication relation between tasks indicates programmable hardware units (FPGA, etc.) indicates programmable processor running software neighborhood definition: consider all tasks for a move between SW and HW that have neighbors which are implemented on another technology (i.e. are connected by an arc)

HillClimbing Discussion simple local optimizations only: algorithm is not able to pass a valley to finally reach a higher peak idea is only applicable to small parts of optimization algorithms but needs to be complemented with other strategies to overcome local optimas 9

Random Search also called Monte Carlo algorithm Idea: random selection of the candidates for a change of intermediate solutions or random selection of the solutions (no use of neighborhood) Discussion: simple (no neighborhood relation is needed) not time efficient, especially where the time to evaluate solutions is high sometimes used as a reference algorithm to evaluate and compare the quality of heuristic optimization algorithms idea of randomization is applied to other techniques, e.g. genetic algorithms and simulated annealing

Simulated Annealing Idea: simulate the annealing process of material: the slow cooling of material leads to a state with minimal energy, i.e. the global optimum Classification: Search strategy random local search Acceptance criteria unconditional acceptance of the selected solution if it represents an improvement over previous solutions; otherwise probabilistic acceptance Termination criteria static bound on the number of iterations (cooling process)

Simulated Annealing Algorithm select initial solution select random solution from neighborhood evaluate quality new solution is better probabilistic choice based on quality of solution n decrease acceptance prob. for poor solutions y y accept solution n maimum number of iterations eceeded y

Simulated Annealing Discussion and Variants Discussion: parameter settings for cooling process is essential (but complicated) slow decrease results in long run times fast decrease results in poor solutions discussion whether temperature decrease should be linear or logarithmic straightforward to implement Variants: deterministic acceptance nonlinear cooling (slow cooling in the middle of the process) adaptive cooling based on accepted solutions at a temperature reheating

Genetic Algorithms Idea: application of evolution theory (survival of the fittest): individuals well adapted to the environment will have more descendants and better adapted descendants application of two basic operations crossover mutation to derive new solutions Classification: Search strategy probabilistic selection of solutions from the population higher quality solutions are selected with higher probability Acceptance criteria new solutions replace older ones Termination criteria static bound on the number of iterations dynamic, e.g. based on improvements of quality of solutions

Genetic Algorithms Basic Operations crossover mutation

Genetic Algorithms Basic Algorithm 6 selection population crossover replacement mutation Replacement and selection rely on some cost function defining the quality of each solution Crossover selection is typically random General parameters: size of population mutation probability candidate selection strategy (mapping quality on probability) replacement strategy (replace own parents, replace weakest, influence of probability) Applicationspecific parameters: mapping of problem on appropriate coding handling of invalid solutions in codings

Genetic Algorithms Application to HW/SW Partitioning Problem statement: target system: one HW unit, one programmable (SW) unit 8 tasks to be assigned to HW or SW no constraints for task assignment (precedence constraints, etc.) cost function: cost table with different (normalized) cost for SW and HW implementation goal: find the HW/SW partition that minimizes Σ (cost + time) over all tasks Coding: represents assignment to HW represents assignment to SW altogether 8 = 6 possible solutions task cost HW SW 8 time HW SW Algorithm details: 8 solutions in population random selection of crossover point mutation probability of.... 6 7 8 7

Genetische Algorithmen Minimum Spanning Tree small population results in inbreeding larger population works well with small mutation rate tradeoff between size of population and number of iterations 8

Genetic Algorithms Discussion finding an appropriate coding for the binary vectors for the specific application at hand is not intuitive problems are redundant codings, codings that do not represent a valid solution, e.g. coding for a sequencing problem tuning of genetic algorithms may be time consuming parameter settings highly depend on problem specifics suited for parallelization 9

Tabu Search Idea: etension of hillclimbing to avoid being trapped in local optima allow intermediate solutions with lower quality maintain history to avoid running in cycles Classification: Search strategy deterministic local search Acceptance criteria acceptance of best solution in neighborhood which is not tabu Termination criteria static bound on number of iterations or dynamic, e.g. based on quality improvements of solutions

Tabu Search Algorithm select initial solution select neighborhood set (based on current solution) remove tabu solutions from set increase neigborhood set is empty y n evaluate quality and select best solution from set n update tabu list termination criteria satisfied y The brain of the algorithm is the tabu list that stores and maintains information about the history of the search. In the most simple case a number of previous solutions are stored in the tabu list. More advanced techniques maintain attributes of the solutions rather than the solutions itself

Tabu Search Organisation of the History The history is maintained by the tabu list Attributes of solutions are a very fleible mean to control the search Eample of attributes of a HW/SW partitioning problem with 8 tasks assigned to of different HW entities: (A) change of the value of a task assignment variable (A) move to HW (A) move to SW (A) combined change of some attributes (A) improvement of the quality of two subsequent solutions over or below a threshold value Aspiration criteria: Under certain conditions tabus may be ignored, e.g. if a tabu solution is the best solution found so far all solutions in a neighborhood are tabu a tabu solution is better than the solution that triggered the respective tabu conditions Intensification checks whether good solutions share some common properties Diversification searches for solution that do not share common properties Update of history information may be recencybased or frequencybased (i.e. depending on the frequency that the attribute has been activated)

Tabu Search Discussion easy to implement (at least the neighborhood search as such) nontrival tuning of parameters tuning is crucial to avoid cyclic search advantage of usage of knowledge, i.e. feedback from the search to control the search (e.g. for the controlled removal of bottlenecks)

Heuristic Search Methods Classification Search strategy search area global search (potentially all solutions considered) local search (direct neighbors only stepwise optimization) selection strategy deterministic selection, i.e. according to some deterministic rules random selection from the set of possible solutions probabilistic selection, i.e. based on some probabilistic function history dependence, i.e. the degree to which the selection of the new candidate solution depends on the history of the search no dependence onestep dependence multistep dependence Acceptance criteria deterministic acceptance, i.e. based on some deterministic function probabilistic acceptance, i.e. influenced by some random factor Termination criteria static, i.e. independent of the actual solutions visited during the search dynamic, i.e. dependent on the search history

Heuristic Search Methods Classification Heuristic Search area Search strategy Selection strategy History dependence Acceptance criterion Termination criterion local global det. prob. random none multistep onestep det. prob. stat. dyn. hillclimbing tabu search simulated annealing genetic algorithms random search

Single Pass Approaches The techniques covered so far search through a high number of solutions. Idea underlying single pass approaches: intelligent construction of a single solution (instead of updating and modification of a number of solutions) the solution is constructed by subsequently solving a number of subproblems Discussion: singlepass algorithms are very quick quality of solutions is often small not applicable where lots of constraints are present (which require some kind of backtracking) Important applications of the idea: list scheduling: subsequent selection of a task to be scheduled until the complete schedule has been computed clustering: subsequent merger of nodes/modules until a small number of cluster remains such that each cluster can be assigned a single HW unit 6

Single Pass Approaches Framework 7 derive guidelines for solution construction select subproblem decide subproblem based on guidelines possibly recompute or adapt guidelines Theguidelinesarecrucial and represent the intelligence of the algorithm n final solution constructed y

List Scheduling List scheduling: subsequent selection of a task to be scheduled on some processor (or HW entity) operation is similar to a dynamic task scheduler of an operating system assign priorities to the tasks according to some strategy priorisation strategy select eecutable task with highest priority assign task to a processor according to some strategy assignment strategy n schedule complete? y 8

List Scheduling Eample () 9 Problem: processors 6 tasks with precedence constraints /8 find schedule with minimal eecution time HLFET (highest level first with estimated times) length of the longest (critical) path to the sink node (node 6) /6 / / / Assignment strategy first fit 6 / Resulting schedule: P P 6 Legend: green: red: estimated times levels (priorities) 6 8

List Scheduling Eample () Problem (unchanged): processors 6 tasks with precedence constraints find schedule with minimal eecution time / SCFET (smallest colevel first with estimated times) length of the longest (critical) path to the source node (node ) / /7 / /6 Assignment strategy first fit 6 /8 Resulting schedule: P P 6 Legend: green: blue: estimated times colevels (priorities) 6 8

Clustering probabilistic deterministic Each node belongs with certain probabilities to different clusters A node belongs to eactly one cluster or not hierarchical Starts with a distance matri of each pair of nodes Eact method: always the same result Termination after all nodes belong to one cluster partitioning Starts with given number of K clusters Results depend on the chosen initial set of clusters Termination after a given number of iterations

Clustering Partitioning of a set of nodes in a given number of subsets n assign each node to a different cluster compute the distance between any pair of clusters select the pair of clusters with the highest affinity merge the clusters termination criteria holds y Application: processor assignment (load balancing minimize interprocess communication) scheduling (minimize critical path) HW/SW partitioning Clustering may be employed as part of the optimization process, i.e. combined with other techniques

Hierarchical Clustering Determine the distance between each pair of nodes Select the smallest distance Replace the selected pair in distance matri by a cluster representative Recompute distance matri Dendrogramm n All nodes in one cluster y

Partitioning Clustering (kmeans) Choose positions of k initial cluster representative assign each node to the nearest cluster representative Recompute positions of the cluster representative Based on the positions of the nodes in each cluster n Number of iterations reached y

Clustering Application to Load Balancing assign each node to a different cluster compute the sum of the communication cost between any pair of clusters Optimization goal: minimize interprocess (intercluster) communication limit maimum load per processor (cluster) to select the pair of clusters with the highest comm. cost that does not violate the capacity constraints merge the clusters y reduction of comm. cost without violation of constraints possible n

Clustering Application to Load Balancing 6 7 6 7 6 7 6 7 6 9 8 9 8 9 8 6 6 8 8 8 8 9 6 6 6 6

Clustering Variants Clustering methods Partitioning methods Hierarchical methods kmeans Fuzzycmeans SOM Clique One Pass GustafsonKessel algorithm Single linkage Complete linkage Average group Centroid MST ROCK Agglomeration (bottom up) Wards Division (top down) Tree Structural Vector Quantification MacnaughtonSmith algorithm Distance Metrics Euclidean Manhattan Minkowsky Mahalanobis Jaccard Camberra Chebychev Correlation Chisquare Kendalls s Rank Correlation 7

Clustering Hierarchical Algorithms 8 Single linkage Complete Linkage Centroidbased

Clustering Single Linkage 9 9 8 7 6 P P P P P P7 6 8 Distance between groups is estimated as the smallest distance between entities Eample: [ d, d ] =. d(,) = min d = Cluster # P P P P P P7 P 7. 7. 6. 9. 7 P.. 6.7 P..8.. P... P.. 6 P7

Integrated HW/SWSystems Andreas MitscheleThiel Clustering Single Linkage 6 7 8 9 6 8 P P P P P P7 P7 6.. P... P...8. P 6.7.. P 7 9. 6. 7. 7. P P7 P P P P P Cluster #. C7..8 P... C 9. 6. 7. P C7 P C P Cluster #. C7..8 C 9. 6. P C7 C P Cluster #

Clustering Group Average 9 8 7 6 P P P P P P7 6 8 Distance between groups is defined as the average distance between all pairs of entities Eample: d = d ( d + ). 8 (,) = Cluster # P P P P P P7 P 7. 7. 6. 9. 7 P.. 6.7 P..8.. P... P.. 6 P7

Integrated HW/SWSystems Andreas MitscheleThiel Clustering Group Average 6 7 8 9 6 8 P P P P P P7 P7 6.. P... P...8. P 6.7.. P 7 9. 6. 7. 7. P P7 P P P P P Cluster #.6 C7..6 P.6..6 C 9. 6.6 7. P C7 P C P Cluster #. C7..8 C 9. 6. 6. P C7 C P Cluster #

Clustering Centroidbased 9 8 7 6 P P P P P P7 6 8 Determine distances between centroids (k,l) Merge centroids with the least distance ( ) ( ) ) C C + C C d ( k, l) = k l yk y l Cluster # P P P P P P7 P 7. 7. 6. 9. 7 P.. 6.7 P..8.. P... P.. 6 P7

Integrated HW/SWSystems Andreas MitscheleThiel Clustering Centroidbased 6 7 8 9 6 8 P P P P P P7 C6. C7.. C... C 9. 6. 7. C C6 C7 C C C Cluster # P7 6.. P... P...8. P 6.7.. P 7 9. 6. 7. 7. P P7 P P P P P Cluster #

Differences between Clustering Algorithms Y (m)...... Y (m). Single Linkage..... Complete Linkage Single Centroid Kmeans Complete WardLinkage.. X (m)...... X (m) Y (m)..... X (m) Centroid Linkage Y (m) Y (m). Kmeans.... X (m)..... Ward X (m). X (m)

Clustering Discussion 6 Results Eact results (single linkage) Noteact results often several iterations are necessary (Kmeans) Metrics Strong impact to clustering results Not each metric is suitable for each clustering algorithm Decision for one or multicriteria metrics (separated or joint clustering) Selection of Algorithm Depends strongly on the structure of the data set and the epected results Some algorithms tend to separate outlayers in own clusters some large clusters and a lot of very small clusters (complete linkage) Only few algorithms are able to detect also branched, curved or cyclic clusters (single linkage) Some algorithms tend to return clusters with nearly equal size (Kmeans, Ward) Quality of clustering results the mean variance of the elements in each cluster (affinity parameter) is often used In general the homogeneity within clusters and the heterogeneity between clusters can be measured However, the quality prediction can be only as good as the quality of the used metric!

Branch and Bound with Underestimates 7 Application of the A* algorithm to the scheduling problem Eample: scheduling on a processor system (processor A and B) Process graph Search Tree > A > B f()=7 f()=7 8 9 6 f() = g() + h() > A > B > A 6 > B f()= f()=8 f()=8 f(6)= Legend: green: processing times blue: comm. times g() eact value of partial schedule h() underestimate for remainder 7 > A 8 > B f(7)= f(8)=8 9 > A > B f(9)= f()=8 Search is terminated when min {f()} is a terminal node (in the search tree)

Branch and Bound with Underestimates Eample: computation of f() > A f()=7 > B f()=7 8 6 9 > A f()= case : path g() = + 8 = h() = f() = 6 case : path g() = h() = + 9 + f() = A B A B > 8 6 > 8 8 6

References A. MitscheleThiel: Systems Engineering with SDL Developing Performance Critical Communication Systems. Wiley,. (section.) C.R. Reeves (ed.): Modern Heuristic Techniques for Combinatorial Problems. Blackwell Scientific Publications, 99. H.U. Heiss: Prozessorzuteilung in Parallelrechnern. BIWissenschaftsverlag, Reihe Informatik, Band 98, 99. M. Garey, D. Johnson: Computer and Intractability. W.H. Freeman, New York, 979. 9