Wednesday, November 2, 2011

Genetic Algorithm and Java

I have written a Java program below to solve a Matrix-type problem using Genetic Algorithm. But let us first review some basic concepts.

There are many tutorials for Genetic Algorithm (GA) on the web. One of the comprehensive tutorials is
"Introduction to Genetic Algorithms, URL http://www.obitko.com/tutorials/genetic-algorithms/".

In many programming languages, "Hello World" programs is used to demonstrate how to code the first and the simplest program. There is a "Hello World" program for Genetic Algorithm too. You can find many "Hello World" examples of GA at  https://github.com/jsvazic/GAHelloWorld , written in various languages, including  the Java language.

However, after seeing Hello World program, interested users may want to see how GA can help to solve more challenging problems (for example, travelling salesman problem).

The will be a big gap between "Hello World" and the actual problems. How are we going to encode (that is, to represent) the actual problem in the chromosome strings? How to design the fitness function?  How does a chromosome string represent 2D problem? and tree structure? These are the questions we have in mind.

Two major challenging areas in writing GA program are: (A) How to encode the problem you have in hand, and (B) How to define a proper Fitness function.

(A) Encoding: There are four major types of encoding.

  • Binary Encoding : For example, "10100111" represents 10 and 7 decimal values.
  • Value Encoding : For example, "00011011" represents "left, right,back, forward", each with 2 bits codes. Or, a sequence of ASCII values can be used to represents the string "Hello, World!". (48hex for "H", 65hex for "e", and so on. The sequence would be "48656C6C6F..." in hexadecimal.)
  • Permutation Encoding : For example, "16248357" represent moving from position 1 to position 6 to position 2 and so on.
  • Tree Encoding : For example, "(+x(/5y))" represents a formula (x+(5/y)) in tree structure. Tree Encoding is more complicated and usually is done in LISP.

(B) Fitness Function: In general, design your fitness function so that value 0 represents the perfect score or best score. We need the program to halt when the Fitness function is equal to zero. Some examples:
  • If the target is to have 8 and 9, with current sequence "10100111" represents 10 and 7, one possible  Fitness function = abs(8-10)+abs(9-7) = +3
  • If the target is to have "Hello, World!", with current sequence "Hellq, Wovld!", one possible Fitness function = 0 + 0 + 0 + 0 + 2 + 0 + 0 + 0 + 4 + 0 + 0 +0 = +6
  • If the target is to have minimum distance when moving from position to positions, with a look-up table somewhere in the program, we can calculate the Fitness function as the sum of distances between 1 and 6, 6 and 2, 2 and 4,... for sequence "16248357".
  • If the target is to have a value of 10, with given x=3, y=1, "(+x(/5y))" represents a formula (x+(5/y)) in tree structure gives us value of 8, and the Fitness functions can be abs(10-8) = +2 .


For learning purpose, let's begin with simplest example and gradually go into the difficult ones. If we know how to encode the problem and how to define the Fitness function, we can easily code the program.

(1) The "Hello World" example:
  • Problem Description: This is the simplest one. With a target string "Hello, World!", GA program will give you ONE answer.
  • Encoding : This is Value Encoding with ASCII code. For target "Hello, World!", use a string (chromosome) of 8*13 bits. Limit the valid characters to the range of 32dec (the space) to 122dec (the 'z').
  • Fitness Function : One possible Fitness Function = Count how many characters mismatch the target string. For example "Hellq, Wovld!" would give you +2 because 2 characters mismatched. This functions can only give you 8 levels to differentiate the fitness all your chromosome. Another possible Fitness Function = Sum up the difference of each mismatched characters. "Hellq, Wovld!" would give you +6. The latter results in more than 8 levels of value and thus a better Fitness function.
  • Java example is available at https://github.com/jsvazic/GAHelloWorld .

(2) The "Mathematical Expression" example:
  • Problem Description: There is a good tutorial here: http://www.ai-junkie.com/ga/intro/gat1.html . The tutorial explains GA in layman term, and the example is: (quoted)
    "Given the digits 0 through 9 and the operators +, -, * and ', find a sequence that will represent a given target number. The operators will be applied sequentially from left to right as you read. So, given the target number 23, the sequence 6+5*4/2+1 would be one possible solution." 
  • Encoding : This is Value Encoding with ASCII code. For target "23", use a string (chromosome) of 72 bits (8*9, or 8*11, 8*13... bits). Limit the valid characters to "0123456789+-*/" and ignore all other characters.
  • Fitness Function : One possible Fitness Function = Evaluate the valid expression and ignore the invalid portion, then calculate the absolute different between the target (say) 23 and the calculated result.
  • With a target "23", this GA program will give you MANY answers. This example starts to show one powerful aspect of GA: You know what you are looking for (the target, 23), but you do not know the actual answer (the sequence). There can be many possible answers and each time you run the GA program, it will give you one possible answer.
  • Java example is available at http://www.ai-junkie.com/ga/intro/gat1.html .



(3) The "Matrix type of problem":
  • Problem Description: given a 3x3 matrix, and digits 1 to 9, find the arrangement that produces the sum of 15 on each row, each column and each diagonal. One of the answers we know is
    +---+---+---+
    | 2 | 9 | 4 |
    +---+---+---+
    | 7 | 5 | 3 |
    +---+---+---+
    | 6 | 1 | 8 |
    +---+---+---+
    

    We need to specify that each of the 1 to 9 has to appear only once. Otherwise, GA program will given undesirable answer like this:
    +---+---+---+
    | 3 | 9 | 3 |
    +---+---+---+
    | 5 | 5 | 5 |
    +---+---+---+
    | 7 | 1 | 7 |
    +---+---+---+
    

  • Encoding : This is Permutation Encoding because we want each character to appear only once. Limit the valid characters to "123456789" and this require some changes in selection, generateChromosome and Fitness methods. See below.
  • Fitness Function : One possible Fitness Function = There are 8 cases to calculate (3 rows, 3 columns, and 2 diagonal).  (i) If each character appears only once is true, we check further. For each cases, if its sum is not equal to 15, we assign 1, otherwise we assign 0. Thus the worse situation is fitness function has the value of 8. (ii) If each character appears only once is not true, assign fitness=9, because we want to indicate that it is worse than any of the cases in (i). 
  • Fitness Function= 0 + 1 + 1 + 1 + 1 + 0 + 1 + 1 = +6
  • +---+---+---+
    | 1 | 9 | 5 |
    +---+---+---+
    | 3 | 4 | 2 |
    +---+---+---+
    | 7 | 6 | 8 |
    +---+---+---+ 
  • Fitness Function= +9, because both 1 and 6 appear twice.
  • +---+---+---+
    | 1 | 1 | 4 |
    +---+---+---+
    | 3 | 9 | 6 |
    +---+---+---+
    | 7 | 2 | 6 |
    +---+---+---+
    
  • Java example:  I have a Java code here...






(4) The "2D problem" example:

(5) The "Tic-Tac-Toe" example:

(6) The "Tree Structure" example:

(7) The "Equation" example:

(8) The Travelling Salesman Problem: This is an "Order-Encoding" problem.

(9) The Real-Life Application:

2 comments:

  1. Thanks for the example. I am interested in evolutionary computation implemented in Java. I posted an entry in http://lunaglacialis.blogspot.com where I describe the development of a genetic algorithm implemented in Java.
    Greetings!

    ReplyDelete