diff --git a/.gitignore b/.gitignore new file mode 100644 index 00000000..abbb3811 --- /dev/null +++ b/.gitignore @@ -0,0 +1,4 @@ +**/.ipynb_checkpoints/ +**/Untitled*.* +**/*.pkl +**/*.pyc diff --git a/Genetic Algorithm Implementation in Python.ipynb b/Genetic Algorithm Implementation in Python.ipynb new file mode 100644 index 00000000..68230966 --- /dev/null +++ b/Genetic Algorithm Implementation in Python.ipynb @@ -0,0 +1,999 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Genetic Algorithm Implementation in Python\n", + "Genetic Algorithm Implementation in Python — By Ahmed F. Gad
\n", + "LinkedIn: https://www.linkedin.com/pulse/genetic-algorithm-implementation-python-ahmed-gad/
\n", + "AI 研習社:https://ai.yanxishe.com/page/TextTranslation/1207\n", + "\n", + "Source code : https://github.com/ahmedfgad/GeneticAlgorithmPython
\n", + "My clone: https://github.com/hcchengithub/GeneticAlgorithmPython\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "reDef unknown\n", + "reDef -->\n" + ] + } + ], + "source": [ + "import peforth # v1.24" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Tutorial Example\n", + "The tutorial starts by presenting the equation that we are going to implement. The equation is shown below:\n", + "\n", + "Y = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + w6x6\n", + "\n", + "The equation has 6 inputs (x1 to x6) and 6 weights (w1 to w6) as shown and inputs values are (x1,x2,x3,x4,x5,x6)=(4,-2,7,5,11,1). We are looking to find the parameters (weights) that maximize such equation. The idea of maximizing such equation seems simple. The positive input is to be multiplied by the largest possible positive number and the negative number is to be multiplied by the smallest possible negative number. But the idea we are looking to implement is how to make GA do that its own in order to know that it is better to use positive weight with positive inputs and negative weights with negative inputs. Let us start implementing GA.\n", + "\n", + "At first, let us create a list of the 6 inputs and a variable to hold the number of weights as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "equation_inputs = [4,-2,3.5,5,-11,-4.7]\n", + "num_weights = len(equation_inputs)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "equation_inputs --> [4, -2, 3.5, 5, -11, -4.7] ()\n", + "num_weights --> 6 ()\n" + ] + } + ], + "source": [ + "%f equation_inputs -->\n", + "%f num_weights -->" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next step is to define the initial population. Based on the number of weights, each chromosome (solution or individual) in the population will definitely have 6 genes, one gene for each weight. But the question is how many solutions per the population? There is no fixed value for that and we can select the value that fits well with our problem. But we could leave it generic so that it can be changed in the code. Next, we create a variable that holds the number of solutions per population, another to hold the size of the population, and finally, a variable that holds the actual initial population:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy\n", + "\n", + "sol_per_pop = 8 # solutions per population \n", + "\n", + "# Defining the population size.\n", + "pop_size = (sol_per_pop,num_weights) # The population will have sol_per_pop chromosome where each chromosome has num_weights genes.\n", + "\n", + "#Creating the initial population.\n", + "new_population = numpy.random.uniform(low=-4.0, high=4.0, size=pop_size)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After importing the numpy library, we are able to create the initial population randomly using the numpy.random.uniform function. According to the selected parameters, it will be of shape (8, 6). That is 8 chromosomes and each one has 6 genes, one for each weight. After running this code, the population is as follows:\n", + "\n", + "#### 實驗\n", + "把 sol_per_pop 改成 80 (population 也就是 chromosome 染色體的條數 也就是一代中缸子裡有多少可能性參與競爭) 結果由原來的 ~40 分上升到 ~130 分。" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[-0.34610757 -0.48601818 -0.21074154 -0.51293995 -3.15336626 -0.9724159 ]\n", + " [-1.60648454 2.98731 -0.17179483 2.13952118 0.83779334 0.7196666 ]\n", + " [-1.74953969 -3.56136451 -0.22888881 -0.28895321 1.16381269 -1.10860643]\n", + " [ 1.77718335 -3.93334683 -0.45756823 2.93656102 2.50230994 3.74874976]\n", + " [ 0.66986951 -3.03126141 -3.63986569 -0.15803427 3.93955901 2.03904557]\n", + " [ 0.45886834 -2.44420888 2.94516232 -1.74789717 -2.94093216 3.55181148]\n", + " [ 2.01301423 3.23521924 2.723441 -3.98261914 3.44337194 1.26286695]\n", + " [ 0.10465165 2.72794357 -2.11283096 0.06572773 -2.74471276 -1.37170897]]\n" + ] + } + ], + "source": [ + "%f new_population . cr" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that it is generated randomly and thus it will definitely change when get run again.\n", + "\n", + "After preparing the population, next is to follow the flowchart in figure 1. Based on the fitness function, we are going to select the best individuals within the current population as parents for mating (挑選最佳個體作為交配的親代). Next is to apply the GA variants (crossover and mutation) 基因洗牌 與 突變 to produce the offspring of the next generation, creating the new population by appending both parents and offspring, and repeating such steps for a number of iterations/generations. The next code applies these steps:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# import GA \n", + "# 分開放,方便重新定義埋進 breakpoints 研究學習\n", + "from GA import *" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "num_generations = 5\n", + "num_parents_mating = 4\n", + "for generation in range(num_generations):\n", + " # Measuring the fitness of each chromosome in the population.\n", + " fitness = cal_pop_fitness(equation_inputs, new_population)\n", + "\n", + " # Selecting the best parents in the population for mating.\n", + " parents = select_mating_pool(new_population, fitness, num_parents_mating)\n", + "\n", + " # Generating next generation using crossover.\n", + " offspring_crossover = crossover(parents, offspring_size=(pop_size[0]-parents.shape[0], num_weights))\n", + " # 子代個數,不懂為何要這樣算? 直接 hard coded 不就好了?\n", + "\n", + " # Adding some variations to the offsrping using mutation. _debug_\n", + " offspring_mutation = mutation(offspring_crossover)\n", + "\n", + " # Creating the new population based on the parents and offspring.\n", + " new_population[0:parents.shape[0], :] = parents\n", + " new_population[parents.shape[0]:, :] = offspring_mutation" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "pop_size --> (8, 6) ()\n", + "parents :> shape --> (4, 6) ()\n", + "( pop_size[0]-parents.shape[0] ) pop_size :> [0] parents :> shape[0] - --> 4 ()\n", + "子代個數,不懂為何要這樣算?\n", + "\n" + ] + } + ], + "source": [ + "%f pop_size -->\n", + "%f parents :> shape -->\n", + "%f ( pop_size[0]-parents.shape[0] ) pop_size :> [0] parents :> shape[0] - --> # 子代個數,不懂為何要這樣算?" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[-1.74953969 -3.56136451 -0.22888881 -0.51293995 -3.15336626 -2.50548442]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -2.82607565]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -2.64899546]\n", + " [-0.34610757 -0.48601818 -0.21074154 -0.51293995 -3.15336626 -2.02495388]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -2.36188952]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -3.03634722]\n", + " [-1.74953969 -3.56136451 -0.22888881 -0.51293995 -3.15336626 -1.19258244]\n", + " [-0.34610757 -0.48601818 -0.21074154 -0.51293995 -3.15336626 -1.55320021]]\n" + ] + } + ], + "source": [ + "%f new_population . cr" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "fitness ==>\n", + " [-1.00000000e+11 3.95704757e+01 3.93110735e+01 3.91365103e+01\n", + " -1.00000000e+11 3.98550832e+01 -1.00000000e+11 -1.00000000e+11] ()\n", + "給 population 評分\n", + "\n", + "parents ==>\n", + " [[-1.74953969 -3.56136451 -0.22888881 -0.51293995 -3.15336626 -2.50548442]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -2.82607565]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -2.64899546]\n", + " [-0.34610757 -0.48601818 -0.21074154 -0.51293995 -3.15336626 -2.02495388]] ()\n", + "從 population 當中挑出最好的種子\n", + "\n", + "offspring_crossover ==>\n", + " [[-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -2.36188952]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -3.03634722]\n", + " [-1.74953969 -3.56136451 -0.22888881 -0.51293995 -3.15336626 -1.19258244]\n", + " [-0.34610757 -0.48601818 -0.21074154 -0.51293995 -3.15336626 -1.55320021]] ()\n", + "種子基因洗牌\n", + "\n", + "offspring_mutation ==>\n", + " [[-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -2.36188952]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -3.03634722]\n", + " [-1.74953969 -3.56136451 -0.22888881 -0.51293995 -3.15336626 -1.19258244]\n", + " [-0.34610757 -0.48601818 -0.21074154 -0.51293995 -3.15336626 -1.55320021]] ()\n", + "種子基因突變\n", + "\n" + ] + } + ], + "source": [ + "%f fitness ==> # 給 population 評分\n", + "%f parents ==> # 從 population 當中挑出最好的種子\n", + "%f offspring_crossover ==> # 種子基因洗牌\n", + "%f offspring_mutation ==> # 種子基因突變" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 這樣就已經完成了,靠!出乎意外的簡單。\n", + "上面這 8 對 chromosome 所組成的 population 是進化到目前的最終結果。以下從這群 population 當中列出結果 " + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Best solution : [[[-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276\n", + " -3.03634722]]]\n", + "Best solution fitness : [44.11477039]\n" + ] + } + ], + "source": [ + "# Getting the best solution after iterating finishing all generations.\n", + "# At first, the fitness is calculated for each solution in the final generation.\n", + "fitness = cal_pop_fitness(equation_inputs, new_population)\n", + "\n", + "# Then return the index of that solution corresponding to the best fitness.\n", + "best_match_idx = numpy.where(fitness == numpy.max(fitness))\n", + "\n", + "print(\"Best solution : \", new_population[best_match_idx, :])\n", + "print(\"Best solution fitness : \", fitness[best_match_idx])\n" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "fitness ==>\n", + " [43.22156528 43.12649401 42.29421709 40.48962302 40.94481918 44.11477039\n", + " 37.05092593 38.27238075] ()\n", + "給 population 評分\n", + "\n", + "best_match_idx --> (array([5], dtype=int64),) ()\n", + "挑出最好的\n", + "\n" + ] + } + ], + "source": [ + "%f fitness ==> # 給 population 評分\n", + "%f best_match_idx --> # 挑出最好的\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 以下一一深入說明" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The current number of generations is 5. It is selected to be small for presenting results of all generations within the tutorial. There is a module named GA that holds the implementation of the algorithm.\n", + "\n", + "The first step is to find the fitness value of each solution within the population using the GA.cal_pop_fitness function. The implementation of such function inside the GA module is as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "def cal_pop_fitness(equation_inputs, pop):\n", + " # Calculating the fitness value of each solution in the current population.\n", + " # The fitness function calculates the sum of products between each input and its corresponding weight.\n", + " fitness = numpy.sum(pop*equation_inputs, axis=1)\n", + " return fitness" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "new_population ==>\n", + " [[-1.74953969 -3.56136451 -0.22888881 -0.51293995 -3.15336626 -2.50548442]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -2.82607565]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -2.64899546]\n", + " [-0.34610757 -0.48601818 -0.21074154 -0.51293995 -3.15336626 -2.02495388]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -2.36188952]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -3.03634722]\n", + " [-1.74953969 -3.56136451 -0.22888881 -0.51293995 -3.15336626 -1.19258244]\n", + " [-0.34610757 -0.48601818 -0.21074154 -0.51293995 -3.15336626 -1.55320021]] ()\n", + "equation_inputs ==>\n", + " [4, -2, 3.5, 5, -11, -4.7] ()\n", + "new_population equation_inputs * ==>\n", + " [[-6.99815875 7.12272902 -0.80111085 -2.56469977 34.68702884 11.77577679]\n", + " [-6.99815875 7.12272902 -0.80111085 0.32863867 30.19184036 13.28255557]\n", + " [-6.99815875 7.12272902 -0.80111085 0.32863867 30.19184036 12.45027866]\n", + " [-1.38443029 0.97203636 -0.73759538 -2.56469977 34.68702884 9.51728325]\n", + " [-6.99815875 7.12272902 -0.80111085 0.32863867 30.19184036 11.10088074]\n", + " [-6.99815875 7.12272902 -0.80111085 0.32863867 30.19184036 14.27083195]\n", + " [-6.99815875 7.12272902 -0.80111085 -2.56469977 34.68702884 5.60513745]\n", + " [-1.38443029 0.97203636 -0.73759538 -2.56469977 34.68702884 7.30004098]] ()\n", + "對 population 裡的每一組 Chromosome 裡的 gene 抓對乘上 equation inputs\n", + "\n", + "-1.74953969 4 * --> -6.99815876 ()\n", + "-1.74953969 4 * --> -3.56136451 -2 * --> 7.12272902 ()\n" + ] + } + ], + "source": [ + "%f new_population ==> \n", + "%f equation_inputs ==>\n", + "%f new_population equation_inputs * ==> # 對 population 裡的每一組 Chromosome 裡的 gene 抓對乘上 equation inputs\n", + "%f -1.74953969 4 * --> -3.56136451 -2 * -->" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "上面這個 cal_pop_fitness() 很簡單,照定義把 population 裡面每個 chromosome 的評分算出來。但是隨應用要修改的就是它。其中 equation_inputs 將來可能就是個 simulator 了; 而 pop (population) 一開始是亂猜的,隨後將「進化」出好的結果。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The fitness function accepts both the equation inputs values (x1 to x6) in addition to the population. The fitness value is calculated as the sum of product (SOP 就是文章一開始那方程式的 Y 值) between each input and its corresponding gene (weight) according to our function. According to the number of solutions per population, there will be a number of SOPs. As we previously set the number of solutions to 8 in the variable named sol_per_pop, there will be 8 SOPs as shown below:" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[43.22156528 43.12649401 42.29421709 40.48962302 40.94481918 44.11477039\n", + " 37.05092593 38.27238075]\n" + ] + } + ], + "source": [ + "SOPs = cal_pop_fitness(equation_inputs,new_population)\n", + "%f SOPs . cr" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that the higher the fitness value the better the solution.\n", + "\n", + "After calculating the fitness values for all solutions, next is to select the best of them as parents in the mating pool according to the next function GA.select_mating_pool. Such function accepts the population, the fitness values, and the number of parents needed. It returns the parents selected. Its implementation inside the GA module is as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def select_mating_pool(pop, fitness, num_parents):\n", + " # Selecting the best individuals in the current generation as parents for producing the offspring of the next generation.\n", + " parents = numpy.empty((num_parents, pop.shape[1])) # 產生 4X6 的 matrix 裡面都是 float 亂數,我覺得這行有機會省略 <== 錯\n", + " for parent_num in range(num_parents): # 轉 n 圈找出最大的 n 個 \n", + " max_fitness_idx = numpy.where(fitness == numpy.max(fitness)) # np.where() 的結果是個 tuple\n", + " max_fitness_idx = max_fitness_idx[0][0] # tuple 裡面是個 array 再裡面才是 index 值,依它就對了。\n", + " # parents[parent_num, :] = pop[max_fitness_idx, :] # 我覺得用 parents.append(pop[max_fitness_idx]) 即可 <== 錯\n", + " parents[parent_num] = pop[max_fitness_idx] # 可以這樣簡化,原式的 \", : \" 冗長確實沒有必要。\n", + " fitness[max_fitness_idx] = -99999999999 # 每轉一圈找出最大值,槓掉已經找到的\n", + " return parents" + ] + }, + { + "cell_type": "raw", + "metadata": {}, + "source": [ + "看懂這行\n", + " max_fitness_idx = numpy.where(fitness == numpy.max(fitness))\n", + " \n", + "首先複習 numpy 的超能力 \n", + " a = [1,2,3,5,9,0] == 3 \n", + " a --> False ()\n", + " \\ 同樣是個 == 運算 python 的只是比較 == 兩邊\n", + "\n", + " b = [1,2,3,5,9,0] == np.int64(3) \n", + " b --> [False False True False False False] ()\n", + " \\ 同樣是個 == 運算 numpy 的是進去一個一個比較\n", + "\n", + " a = [1,2,3,5,9,0] == np.max([1,2,3,5,9,0]) \n", + " a --> [False False False False True False] ()\n", + " \\ 所以這就好懂了\n", + "\n", + "numpy.where() 的傳回值比較奇怪,是個 tuple 而 tuple 裡面又是個 array . . . 反正依它就對了。\n", + " b = np.where([1,2,3,5,9,0] == np.max([1,2,3,5,9,0]))\n", + " b count --> 1 ()\n", + " b count --> --> (array([4], dtype=int64),) ()\n", + " b :> [0] --> [4] ()\n", + " b :> [0][0] --> 4 ()\n", + "\n", + "numpy.array 沒有 .append,雖然有 np.append(array, sth) 但是這個 sth 如果是個 series 就一定會被拆散之後 append 進去!所以也不合用。正確的用法是用 np.empty((0,6)) 做出一個空的 parents, 但雖然 empty 仍有 shape <=== 這就是 numpy 要的。\n", + "\n", + " parents = numpy.empty((0, 6)) # 不能省略,但是可以簡化到像這樣空無一物\n", + " parents --> [] ()\n", + " parents :> shape --> (0, 6) ()\n", + "\n", + "這地方如果用 sorting 的程式當然會很簡短,但是應用在實際上很可能太浪費時間。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# 這段改寫只是好玩,結果沒有原來的好。\n", + "\n", + "debug = False\n", + "def select_mating_pool(pop, fitness, num_parents):\n", + " # Selecting the best individuals in the current generation as parents for producing the offspring of the next generation.\n", + " # parents = numpy.empty((num_parents, pop.shape[1])) # 產生 4X6 的 matrix 裡面都是 float 亂數,我覺得這行有機會省略\n", + " # 我以為這行可省略,錯! 還是要有個初始 shape 才行!\n", + " parents = numpy.empty((0, pop.shape[1])) # 不能省略,但是可以簡化到像這樣空無一物\n", + " if debug : peforth.push(locals()).ok(\"bp11>\",cmd='to _locals_') \n", + " for parent_num in range(num_parents): # 轉 n 圈找出最大的 n 個 \n", + " max_fitness_idx = numpy.where(fitness == numpy.max(fitness)) # np.where() 的結果是個 tuple\n", + " max_fitness_idx = max_fitness_idx[0][0] # tuple 裡面是個 array 再裡面才是 index 值,依它就對了。\n", + " # parents[parent_num, :] = pop[max_fitness_idx, :] # 我覺得用 parents.append(pop[max_fitness_idx]) 即可 <==錯!無此物且 np.append 也不合用\n", + " parents = numpy.vstack((parents, pop[max_fitness_idx])) # 這樣改寫,沒有比原來的好 :-( \n", + " fitness[max_fitness_idx] = -99999999999 # 每轉一圈找出最大值,已經找到過的就槓掉\n", + " if debug : peforth.push(locals()).ok(\"bp22>\",cmd='to _locals_') \n", + " return parents" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Based on the number of parents required as defined in the variable num_parents_mating, the function creates an empty array to hold them as in this line:\n", + "\n", + " parents = numpy.empty((num_parents, pop.shape[1]))\n", + "\n", + "Looping through the current population, the function gets the index of the highest fitness value because it is the best solution to be selected according to this line:\n", + "\n", + " max_fitness_idx = numpy.where(fitness == numpy.max(fitness))\n", + " \n", + "This index is used to retrieve the solution that corresponds to such fitness value using this line:\n", + "\n", + " parents[parent_num, :] = pop[max_fitness_idx, :]\n", + " parents[parent_num] = pop[max_fitness_idx] # 這樣即可,上式過度冗贅。\n", + " \n", + "To avoid selecting such solution again, its fitness value is set to a very small value that is likely to not be selected again which is -99999999999. The parents array is returned finally which will be as follows according to our example: " + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -3.03634722]\n", + " [-1.74953969 -3.56136451 -0.22888881 -0.51293995 -3.15336626 -2.50548442]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -2.82607565]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -2.64899546]]\n" + ] + } + ], + "source": [ + "parents_array = select_mating_pool(new_population, SOPs, num_parents_mating)\n", + "%f parents_array . cr" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that these three (四個吧?) parents are the best individuals within the current population based on their fitness values which are 18.24112489, 17.0688537, 15.99527402, and 14.40299221, respectively. (這些值都是亂數,每次都不一樣)\n", + "\n", + "Next step is to use such selected parents for mating in order to generate the offspring. The mating starts with the crossover operation according to the GA.crossover function. This function accepts the parents and the offspring size. It uses the offspring size to know the number of offspring to produce from such parents. Such a function is implemented as follows inside the GA module:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "debug = False \n", + "def crossover(parents, offspring_size): # offspring_size --> (4, 6) ()\n", + " offspring = numpy.empty(offspring_size)\n", + " # The point at which crossover takes place between two parents. Usually, it is at the center.\n", + " crossover_point = numpy.uint8(offspring_size[1]/2) # 整條 chromosome 的中央, 3, 何必每次重算? \n", + "\n", + " for k in range(offspring_size[0]):\n", + " # Index of the first parent to mate. --> 0 1 2 3\n", + " parent1_idx = k%parents.shape[0] # parents :> shape[0] --> 4 \n", + " # Index of the second parent to mate.\n", + " parent2_idx = (k+1)%parents.shape[0] --> 1 2 3 0\n", + " # The new offspring will have its first half of its genes taken from the first parent.\n", + " offspring[k, 0:crossover_point] = parents[parent1_idx, 0:crossover_point] # 染色體的前半段 來自母親\n", + " # The new offspring will have its second half of its genes taken from the second parent.\n", + " offspring[k, crossover_point:] = parents[parent2_idx, crossover_point:] # 染色體的後半段 來自父親 \n", + " if debug : peforth.push(locals()).ok(\"bp33>\",cmd='to _locals_') \n", + " return offspring # 子代的染色體一半來自母親、另一半來自父親 " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The function starts by creating an empty array based on the offspring size as in this line:\n", + "\n", + " offspring = numpy.empty(offspring_size) # offspring_size --> (4, 6) ()\n", + "\n", + "Because we are using single point crossover, we need to specify the point at which crossover takes place. The point is selected to divide the solution into two equal halves according to this line:\n", + "\n", + " crossover_point = numpy.uint8(offspring_size[1]/2)\n", + "\n", + "Then we need to select the two parents to crossover. The indices of these parents are selected according to these two lines:\n", + "\n", + " parent1_idx = k%parents.shape[0]\n", + " parent2_idx = (k+1)%parents.shape[0]\n", + "\n", + "The parents are selected in a way similar to a ring. The first with indices 0 and 1 are selected at first to produce two offspring. If there still remaining offspring to produce, then we select the parent 1 with parent 2 to produce another two offspring. If we are in need of more offspring, then we select the next two parents with indices 2 and 3. By index 3, we reached the last parent. If we need to produce more offspring, then we select parent with index 3 and go back to the parent with index 0, and so on.\n", + "\n", + "The solutions after applying the crossover operation to the parents are stored into the offspring variable and they are as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "offsprings ==>\n", + " [[-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -2.82607565]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -2.64899546]\n", + " [-1.74953969 -3.56136451 -0.22888881 -0.51293995 -3.15336626 -2.02495388]\n", + " [-0.34610757 -0.48601818 -0.21074154 -0.51293995 -3.15336626 -2.50548442]] ()\n" + ] + } + ], + "source": [ + "offsprings = crossover(parents, (4,6))\n", + "%f offsprings ==>" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "我覺得完整的排列組合應該有六組才對,如下:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from itertools import combinations # combinations 數學「排列組合」中的「組合」\n", + "c = [c for c in combinations([1, 2, 3, 4], 2)]\n", + "%f c --> # 大鍋炒 ;-D " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next is to apply the second GA variant, mutation, to the results of the crossover stored in the offspring variable using the mutation function inside the GA module. Such function accepts the crossover offspring and returns them after applying uniform mutation. That function is implemented as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "debug = False # _debug_\n", + "def mutation(offspring_crossover): # 前面洗過牌的子代 shape (4,6)\n", + " # Mutation changes a single gene in each offspring randomly.\n", + " for idx in range(offspring_crossover.shape[0]): # 4 個子代一一加以突變\n", + " # The random value to be added to the gene.\n", + " random_value = numpy.random.uniform(-1.0, 1.0, 1) # 一個 random 數\n", + " offspring_crossover[idx, 4] = offspring_crossover[idx, 4] + random_value # 驚!突變只發生在某個基因 (#4)上。\n", + " if debug : peforth.push(locals()).ok(\"bp44>\",cmd='to _locals_') \n", + " return offspring_crossover" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It loops through each offspring and adds a uniformly generated random number in the range from -1 to 1 according to this line:\n", + "\n", + " random_value = numpy.random.uniform(-1.0, 1.0, 1)\n", + "\n", + "Such random number is then added to the gene with index 4 of the offspring according to this line:\n", + "\n", + " offspring_crossover[idx, 4] = offspring_crossover[idx, 4] + random_value\n", + " \n", + "Note that the index could be changed to any other index. The offspring after applying mutation are as follows:\n", + "\n", + " [[-0.63698911 -2.8638447 2.93392615 -0.72163167 1.66083721 0.00677938]\n", + " [ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.94513681 2.29682254]\n", + " [ 1.96561297 0.51030292 0.52852716 3.78571392 0.45337472 3.5170347 ]\n", + " [ 2.12480298 2.97122243 3.60375452 -1.40103767 -1.5781162 0.30567304]]" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -2.89300931]\n", + " [-1.74953969 -3.56136451 -0.22888881 0.06572773 -2.74471276 -4.01220049]\n", + " [-1.74953969 -3.56136451 -0.22888881 -0.51293995 -3.15336626 -1.44153594]\n", + " [-0.34610757 -0.48601818 -0.21074154 -0.51293995 -3.15336626 -1.70830617]]\n" + ] + } + ], + "source": [ + "offspring = mutation(offspring_crossover)\n", + "%f offspring . cr" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Such results are added to the variable offspring_crossover and got returned by the function.\n", + "\n", + "At this point, we successfully produced 4 offspring from the 4 selected parents and we are ready to create the new population of the next generation.\n", + "\n", + "Note that GA is a random-based optimization technique. It tries to enhance the current solutions by applying some random changes to them. Because such changes are random, we are not sure that they will produce better solutions. For such reason, it is preferred to keep the previous best solutions (parents) in the new population. In the worst case when all the new offspring are worse than such parents, we will continue using such parents. As a result, we guarantee that the new generation will at least preserve the previous good results and will not go worse. The new population will have its first 4 solutions from the previous parents. The last 4 solutions come from the offspring created after applying crossover and mutation:\n", + "\n", + " new_population[0:parents.shape[0], :] = parents\n", + " new_population[parents.shape[0]:, :] = offspring_mutation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "By calculating the fitness of all solutions (parents and offspring) of the first generation, their fitness is as follows:\n", + "\n", + " [ 18.24112489 17.0688537 15.99527402 14.40299221 -8.46075629 31.73289712 6.10307563 24.08733441]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The highest fitness previously was **18.24112489** but now it is **31.7328971158**. That means that the random changes moved towards a better solution. This is GREAT. But such results could be enhanced by going through more generations. Below are the results of each step for another 4 generations:\n", + "\n", + " Generation : 1\n", + "\n", + " Fitness values:\n", + "\n", + " [ 18.24112489 17.0688537 15.99527402 14.40299221 -8.46075629 31.73289712 6.10307563 24.08733441]\n", + "\n", + " Selected parents:\n", + "\n", + " [[ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.94513681 2.29682254]\n", + " [ 2.12480298 2.97122243 3.60375452 -1.40103767 -1.5781162 0.30567304]\n", + " [-0.63698911 -2.8638447 2.93392615 -1.40103767 -1.20313655 0.30567304]\n", + " [ 3.00912373 -2.745417 3.27131287 -0.72163167 0.7516408 0.00677938]]\n", + "\n", + " Crossover result:\n", + "\n", + " [[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.5781162 0.30567304]\n", + " [ 2.12480298 2.97122243 3.60375452 -1.40103767 -1.20313655 0.30567304]\n", + " [-0.63698911 -2.8638447 2.93392615 -0.72163167 0.7516408 0.00677938]\n", + " [ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.94513681 2.29682254]]\n", + "\n", + " Mutation result:\n", + "\n", + " [[ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.2392086 0.30567304]\n", + " [ 2.12480298 2.97122243 3.60375452 -1.40103767 -0.38610586 0.30567304]\n", + " [-0.63698911 -2.8638447 2.93392615 -0.72163167 1.33639943 0.00677938]\n", + " [ 3.00912373 -2.745417 3.27131287 -1.56909315 -1.13941727 2.29682254]]\n", + "\n", + " Best result after generation 1 : 34.1663669207\n", + ". . . snip . . .\n", + " Generation : 4\n", + "\n", + " Fitness values\n", + "\n", + " [ 34.59304326 34.16636692 33.7449326 31.73289712 44.81692352\n", + "\n", + " 33.35989464 36.46723397 37.19003273]\n", + "\n", + " Selected parents:\n", + "\n", + " [[ 3.00912373 -2.745417 3.27131287 -1.40103767 -2.20744102 0.30567304]\n", + " [ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.44124005 2.29682254]\n", + " [ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.37553107 2.29682254]\n", + " [ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.20515009 2.29682254]]\n", + "\n", + " Crossover result:\n", + "\n", + " [[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.37553107 2.29682254]\n", + " [ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.20515009 2.29682254]\n", + " [ 3.00912373 -2.745417 3.27131287 -1.40103767 -2.20744102 0.30567304]]\n", + "\n", + " Mutation result:\n", + "\n", + " [[ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.13382082 2.29682254]\n", + " [ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.98105233 2.29682254]\n", + " [ 3.00912373 -2.745417 3.27131287 -1.56909315 -2.27638584 2.29682254]\n", + " [ 3.00912373 -2.745417 3.27131287 -1.40103767 -1.70558545 0.30567304]]\n", + "\n", + " Best result after generation 4: 44.8169235189\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After the above 5 generations, the best result now has a fitness value equal to 44.8169235189 compared to the best result after the first generation which is 18.24112489.\n", + "\n", + "The best solution has the following weights:\n", + "\n", + " [3.00912373 -2.745417 3.27131287 -1.40103767 -2.20744102 0.30567304]\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Experimental giving different gene index, the result is similar \n", + "\n", + "def mutation(offspring_crossover, index): # 前面洗過牌的子代 shape (4,6)\n", + " # Mutation changes a single gene in each offspring randomly.\n", + " for idx in range(offspring_crossover.shape[0]): # 4 個子代一一加以突變\n", + " # The random value to be added to the gene.\n", + " random_value = numpy.random.uniform(-1.0, 1.0, 1) # 一個 random 數\n", + " offspring_crossover[idx, index] = offspring_crossover[idx, index] + random_value # 驚!突變只發生在某個基因 (index)上。\n", + " return offspring_crossover" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "new_population ==>\n", + " [[ 1.96827509 2.04341469 2.18141111 1.21290069 2.66431037 2.65924713]\n", + " [ 0.28903627 -1.00720767 -2.18243685 -1.11505618 -3.58946822 -0.13887248]\n", + " [-0.85422217 2.26897987 2.89642152 2.19151511 -2.13300249 1.02256388]\n", + " [ 0.1840384 -1.05159694 -1.24792706 1.30606635 0.73341364 1.00730912]\n", + " [-1.46312966 -2.64284936 2.58174363 -0.44863616 3.86220586 -3.14776126]\n", + " [ 0.56826096 -0.77048885 -2.6066013 3.60011285 0.58856147 -0.78272734]\n", + " [ 1.00016844 3.68890169 -1.55910163 -3.30176257 0.94530628 -1.61012287]\n", + " [-3.74550804 -2.91499567 -0.90587322 2.10201257 3.04464826 2.79962532]] ()\n", + "original\n", + "\n", + "Generation : 0\n", + "Best result : 39.76344053059758\n", + "Generation : 1\n", + "Best result : 39.76344053059758\n", + "Generation : 2\n", + "Best result : 46.32812817024072\n", + "Generation : 3\n", + "Best result : 46.32812817024072\n", + "Generation : 4\n", + "Best result : 47.9862437364507\n", + "\n", + "offspring_mutation ==>\n", + " [[-0.85422217 2.26897987 2.89642152 -1.11505618 -3.58946822 -2.53079731]\n", + " [-0.85422217 2.26897987 2.89642152 -1.11505618 -3.58946822 -1.24700052]\n", + " [-0.85422217 2.26897987 2.89642152 -1.11505618 -3.58946822 -0.47570561]\n", + " [-0.85422217 2.26897987 2.89642152 -1.11505618 -3.58946822 -2.26286722]] ()\n", + "after all\n", + "\n", + "Best solution : [[[-0.85422217 2.26897987 2.89642152 -1.11505618 -3.58946822\n", + " -2.53079731]]]\n", + "Best solution fitness : [47.98624374]\n" + ] + } + ], + "source": [ + "import numpy\n", + "import GA\n", + "\n", + "#\n", + "# The y=target is to maximize this equation ASAP:\n", + "# y = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6\n", + "# where (x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7)\n", + "# What are the best values for the 6 weights w1 to w6?\n", + "# We are going to use the genetic algorithm for the best possible values after a number of generations.\n", + "#\n", + "\n", + "# Inputs of the equation.\n", + "equation_inputs = [4,-2,3.5,5,-11,-4.7]\n", + "\n", + "# Number of the weights we are looking to optimize.\n", + "num_weights = len(equation_inputs)\n", + "\n", + "#\n", + "# Genetic algorithm parameters:\n", + "# Mating pool size\n", + "# Population size\n", + "#\n", + "\n", + "sol_per_pop = 8\n", + "num_parents_mating = 4\n", + "\n", + "# Defining the population size.\n", + "pop_size = (sol_per_pop,num_weights) # The population will have sol_per_pop chromosome where each chromosome has num_weights genes.\n", + "\n", + "#Creating the initial population.\n", + "new_population = numpy.random.uniform(low=-4.0, high=4.0, size=pop_size)\n", + "\n", + "%f new_population ==> # original \n", + "\n", + "num_generations = 5\n", + "\n", + "for generation in range(num_generations):\n", + " print(\"Generation : \", generation)\n", + "\n", + " # Measing the fitness of each chromosome in the population.\n", + " fitness = GA.cal_pop_fitness(equation_inputs, new_population)\n", + "\n", + " # Selecting the best parents in the population for mating.\n", + " parents = GA.select_mating_pool(new_population, fitness, num_parents_mating)\n", + "\n", + " # Generating next generation using crossover.\n", + " offspring_crossover = GA.crossover(parents, offspring_size=(pop_size[0]-parents.shape[0], num_weights))\n", + "\n", + " # Adding some variations to the offsrping using mutation.\n", + " # offspring_mutation = GA.mutation(offspring_crossover)\n", + " offspring_mutation = mutation(offspring_crossover, 1) # 改用上面的實驗版\n", + "\n", + " # Creating the new population based on the parents and offspring.\n", + " new_population[0:parents.shape[0], :] = parents\n", + " new_population[parents.shape[0]:, :] = offspring_mutation\n", + "\n", + " # The best result in the current iteration.\n", + " print(\"Best result : \", numpy.max(numpy.sum(new_population*equation_inputs, axis=1)))\n", + "\n", + "%f cr \n", + "%f offspring_mutation ==> # after all\n", + "\n", + "# Getting the best solution after iterating finishing all generations.\n", + "# At first, the fitness is calculated for each solution in the final generation.\n", + "fitness = GA.cal_pop_fitness(equation_inputs, new_population)\n", + "\n", + "# Then return the index of that solution corresponding to the best fitness.\n", + "best_match_idx = numpy.where(fitness == numpy.max(fitness))\n", + "\n", + "print(\"Best solution : \", new_population[best_match_idx, :])\n", + "print(\"Best solution fitness : \", fitness[best_match_idx])\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# --- The End ---" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.0" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}