An approach to the inference of finite state machines based on a gravitationally-inspired search algorithm

As the inference of a finite state machine from samples of its behaviour is NP-hard, heuristic search algorithms need to be applied. In this article we propose a methodology based on applying a new gravitationally-inspired heuristic search algorithm for the inference of Moore machines. Binary representation of a Moore machine, an evaluation function, and the required parameters of the algorithm are presented. The experimental results show that this method has a lot of potential.


INTRODUCTION
Identification is an inference process, which deduces an internal representation of a system (named internal model) from samples of its functioning (named external model) [1].The inference of finite state machines (FSMs) is widely applied in different fields, such as logical design, verification, and software systems.
The goal of identification is to find the 'best' FSM, which respects the dynamics of the external model.In practice, the 'best' FSM is the one that best describes the model behaviour given by input-output sequences.We are interested in finding a minimum size deterministic FSM consistent with the set of the given samples.This is an NP-hard problem [2].Heuristic algorithms are an alternative that can reduce the complexity of the identification methods.
The paper is organized as follows.Section 2 provides an overview of the problem of FSM inference.Section 3 describes gravitationally-inspired search algorithms.Section 4 introduces our approach, and Section 5 shows experimental results of the work.

Problem statement
We give a brief overview of our approach to FSM identification.There are several types of FSMs, but in this article we will discuss only one well-known representation of them, namely the Moore machines.
A Moore machine is a six-tuple Mo = Q, Σ, ∆, δ , λ , q 0 , where • Q is a finite set of states, where q 0 denotes the initial state, • Σ is the input alphabet, • ∆ is the output alphabet, • δ : Q × Σ → Q is the transition function, • λ : Q → ∆ is the output function represented by the output table that shows what character from ∆ will be printed by each state that is entered [3].
The general structure of our approach to the inference of FSMs is presented in Fig. 1.The outline of our approach is the following: 1.The system to be inferred is tested and samples of its functioning are generated.Some of the samples are chosen as training data and some as testing data.2. The number of states in the FSM is received as an input.3. The search algorithm is applied and a FSM M is outputted.4. M is evaluated using the given training data and/or testing data.If M describes the given input-output data sufficiently well, it is considered as result.
Otherwise the search process with other parameters or training data will be repeated.5.If required, post-processing (e.g., minimization, reduction of unreachable states) is applied.
It is possible to specify several criteria for the required result.The first criterion is consistency of the FSM.Using this criterion, we can define two different types of solutions: the generalized solution (i.e., the solution that performs correctly for all positive inputoutput sequences) and the consistent solution (i.e., the solution that performs correctly for the input-output sequences used in the training set).Another criterion is the FSM size.We can search for the minimal FSM or a FSM with k or fewer states.
We formulate our goal as the inference of a deterministic FSM with k or fewer states, consistent with input-output sequences at hand.

Background
Heuristic techniques are widely applied to the inference of different types of FSMs.The most popular are the various types of Evolutionary Algorithms.In the early 1960s Fogel et al. [4] introduced Evolutionary Programming (EP).The simulated evolution was performed by modifying a population of FSM.Other authors also used EP for solving the problem of FSM identification.Chellapilla and Czarnecki [5] proposed the variation of EP to solve the problem of modular FSM synthesis.Benson [6] presented a model comprising an FSM with embedded genetic programs which co-evolve to perform the task of Automatic Target Detection.
Another approach to solve the problem of FSM identification is based on the Genetic Algorithm (GA).This method has been researched by several authors.Ngom et al. [7] used genetic simulation for Moore machine identification, Tongchim and Chongstitvatana [8] investigated parallel implementation of the GA to solve the problem of FSM synthesis.Lucas [9] paid more attention to finite state transducers and he and Reynolds [10] compared this method to 'Heuristic State Merging'.Niparnan and Chongstitvatana [11] improved GA by evolving only the state transition function.
Chongstitvatana and Aporntewan [12] presented a method of FSM synthesis from multiple partial input/output sequences.Horihan and Lu [13] focused on improving the FSM evolution by using progressive fitness functions.Also Generated Simulated Annealing was used for the inference of FSM [14].
We apply a gravitationally-inspired search algorithm.The next section describes the general ideas of this new class of algorithms.

Gravity as inspiration for heuristic search algorithms
Four main forces are acting in our universe: gravitational, electromagnetic, weak nuclear, and strong nuclear.These forces define the way our universe behaves and appears.The weakest force is gravitational; it defines how objects move depending on their mass.In physics three kinds of masses can be distinguished (active mass M a , passive mass M p , and inertial mass M i ), which have been shown experimentally to be equivalent (see [15]).
The gravitational force between two objects i and j is directly proportional to the product of their masses and inversely proportional to the square distance between them Knowing the force acting on a body we can compute acceleration as Our universe is growing, this yields an effect of decreasing gravity, so the gravitational 'constant' can be described as We can formulate the following basic ideas inspired by gravity: • Each object in the universe has mass and position.
• There are some interactions between objects, which can be described using the law of gravity.• Bigger objects create larger gravitational fields and attract smaller ones.
During the last decade some researchers have tried to adapt the idea of gravity to find out optimal search algorithms.Such algorithms have some general ideas in common: • The system is modelled by objects with mass.
• The position of those objects describes the solution, and the mass of the objects depends on the objective function.
• The objects interact with one another using gravitational force.• The objects with greater mass present the points in the search space with better solutions.Using these characteristics, it is possible to define the family of optimization algorithms based on gravitational force.For example, Central Force Optimization (CFO) is a deterministic gravity-based search algorithm proposed and developed by Formato [16].It simulates the group of probes that fly into search space and explore it.Another algorithm, Space Gravitational Optimization (SGO), was developed by Hsiao et al. [17] in 2005.It simulates asteroids flying through curved search space.A gravitationally-inspired variation of local search, Gravitational Emulation Local Search Algorithm (GELS), was proposed by Webster and Bernhard [18] and further elaborated by Webster [19].The newest one, Gravitational Search Algorithm (GSA), was described by Rashedi et al. [20] as a stochastic variation of CFO.
The next subsection will give a more detailed overview of the GSA, which is used as a basis of our approach.

Gravitational search algorithms
The GSA was described by Rashedi et al. [20] as a stochastic variation of the CFO and used for different applications.It was successfully applied to optimize various continuous problems, such as filter modelling [21], the set covering problem [22], allocation of static var compensator [15], and synthesis of thinned scanned concentric ring array antenna [23].
The algorithm is constructed so that there is a system of N objects, each of which is described by a real-valued position vector, and each position vector codes candidate solution where x d i represents the position of the ith object in dimension d.
Masses of objects are computed based on the quality measure as follows: (5) where worst(t) and best(t) are defined for maximization problem as and f it i is the value of the objective function.
In other words, a heavier mass means that the quality of the object is better and it has greater attraction and inertia (i.e., moves slowly towards other objects).
At a specific time t we can recompute the force that is applied to the object i with mass M i by some object j with mass M j where ε is a free parameter, required to avoid division by zero, and R i j is the Euclidean distance between position vectors: According to Rashedi et al. [15], R i j gives better experimental results than R 2 i j .The gravitational constant G (Eq. ( 3)) is computed as In physics, the general force acting on an object is computed as a vector sum of all acting forces.In the GSA, a stochastic characteristic is added to the algorithm, so the general force is computed as The acceleration of object i can be computed knowing its inertial mass M ii and force F d i (t) as Knowing current acceleration, we can recompute velocity and position as follows: The general procedure of the GSA is described in Algorithm 1. Firstly, the initial set of objects is generated randomly.Secondly, each object is evaluated.Based on evaluation results, the required parameters (G(t), worst(t), best(t)) are updated, and the forces and accelerations are computed.Thirdly, the agents' positions are changed according to acting forces and the updated positions are evaluated.The process continues until the best solution is found or the number of iterations is over.In the GSA, the position vector is real-valued.However, for some applications discrete or binary vectors are required.A discrete modification of the algorithm was proposed by Zibanezhad et al. [24] in a context of Web-Service composition.The binary GSA (BGSA) was introduced by Rashedi et al. [25] in 2010.In the next section we will focus on the BGSA.

Binary gravitational search algorithm
The key difference between the GSA and the BGSA is the binary search space, meaning that each dimension has only two possible values: '0' or '1'.The main laws of the BGSA may be defined as in real-valued case (see Eqs (7), (11), and ( 12)).But the positions' updating law (see Eq. ( 13)) must be modified so that each dimension changes between two values according to the velocity.A higher velocity gives a greater probability of changing the value.
To modify Eq. ( 13), in the BGSA a special probability function S(v d i ) was introduced, which transfers the value of v d i to [0 . . .1]: The law for updating the position can be defined as follows: where F(x d i (t)) = complement(x d i (t)).Some other modifications were made: • Gravitational constant G is considered as a linear decreasing function

GRAVITATIONALLY-INSPIRED SEARCH ALGORITHM FOR THE INFERENCE OF FSMs
To apply the BGSA to the inference of FSMs we need to define an objective function and a process of encoding FSMs to a binary position vector.Also modifications of the original BGSA have to be made.
To store the information about state q j , we need to store the output value o j of the state and corresponding transitions from the given state q j to get some target state q i k , which are activated by reading symbol i k .Each section represents one state (Fig. 2), where the first part is an output value of the state and the other part stores the corresponding transitions from that state.Initially, information is presented in a decimal way (decimal representation).To get binary representation we transform each integer number to the corresponding binary number.
The number of bits required for storing the whole binary position vector can be computed as follows: Each Mo has a unique binary representation, but not each binary string has a corresponding Mo.
Let us take a look at a Moore machine with the transition diagram presented in Fig. 3.
Thus we need 20 bits to store this FSM (Eq.( 17)): 4 • ( log 2 2 + log 2 4 • 2) = 20 bits.The general structure of the position vector required to encode this FSM is presented in Fig. 4. State q j o j q i 0 q ... q i k−1 Fig. 2. A section of the binary position vector for storing the Moore machine with a fixed number of states.

An objective function
We propose an objective function defined on all inputoutput sequences (pairs {input, output}).The idea is to estimate the proximity between the current and the desired FSMs by finding the distance between strings.

Distance between strings
Consider a function ∆(a, b), where a, b are symbols in some alphabet, and define That is, if character a is not equal to character b, the function ∆(a, b) will return 1, otherwise the function will return 0.
We propose two distance functions between strings x and y.The first function is the Hamming distance d Ham .
To compute it, we need to count the number of different bits in the same positions The second function evaluates the length of maximal equal prefix d LP (i.e., the computation will be stopped at the first difference between strings)

Evaluation of the objective function
We specify several objective functions for evaluating FSMs based on d Ham and d LP .Assume we have our training data represented as a collection of input-output sequences (the size of the collection is n).We also have output strings produced by an FSM (see Table 1).
Our task is to measure how 'far' the strings generated by the FSM are from the expected strings.The objective function based on the Hamming distance (d Ham ) defines the objective function as follows: where n is the number of the given data and l i is the length of Out expected i .
In the second case we use d LP for measuring the distance.Thus, the objective function can be defined as the sum of the lengths of all sequences

Algorithm description
In this section we will focus on the properties of our algorithm.Search space is described by a set of binary position vectors, where each position vector corresponds to an FSM as described in Section 4.1.First, we set • the free parameter ε, • the maximal speed v max , • the number of iterations, • the number of objects, • the number of states n in the FSM, • the initial value of gravitational constant G 0 , • the mass value minimum M min according to the problem under consideration.
The initial positions are generated randomly from the feasible region, so that each position corresponds to an FSM.To do so, the FSM is generated in decimal form, a number of symbols in input and output alphabet are restored from the input data.After generating the FSM in decimal form it is encoded into binary representation (see Section 4.1).
The objective function of a candidate solution is computed as described in Subsection 4.2.2.Despite the fact that in physics the active, passive, and inertial masses are considered to be equivalent (see 3.1), we modified mass computation laws to improve the search algorithm.The active M a , passive M p , and inertial M i masses are computed as follows If M a is smaller than the minimum value M min of the defined mass, then M a = 0 (i.e., an object with a smaller mass does not create a gravitational field).Forces acting on the object are computed via Eq.( 7).The distance in one dimension can be computed as follows: The acceleration vector is computed via Eq.( 11).The velocity vector is computed by Eq. ( 12).If the velocity is higher than v max , then its value will be set to v max .
The new position is computed using the old position and the velocity vector (see Eq. ( 15)).The probability function (i.e., the threshold function) S(v d i ) is taken as in this case v max = π/2.

IMPLEMENTATION AND EXPERIMENTS
Our approach was implemented in Java (JDK 1.5) and tested on random machines and some 'toy' examples.
Results are compared to the canonical Genetic Algorithm (more details about GA can be found in [10,26]).

Experiments I
Experiments were constructed so that general parameters, such as the number of iterations and the number of objects, the encoding of the Moore machine, and its initialization algorithm are the same (see Subsection 4.1).Evaluation of the machine is described in Subsection 4.2; the objective function is constructed on the Hamming similarity.The specific parameters of the algorithm are described for a concrete experiment in the corresponding table.
During the experiments, each algorithm was run 20 times with a different initial set of objects.Results are presented in Table 2 and Table 3, where the row 'Init.%' shows the mass value of the best solution at the initial step (randomly generated), the row 'Sol.%' shows the object value of the best found solution, and the row 'Iter.' shows how many iterations were required to find this solution ('-' means that the best possible solution was not found).

Pattern recognizer
The goal of this experiment was to reconstruct a pattern 'aab' recognizer (see Table 2) from the given inputoutput pairs.As input data we use six pairs with each input string having a length of 12.The number of states n is four.The number of iterations is taken 100, and the number of objects equals 200.
This experiment showed that the BGSA was more frequently able to find 100% solutions than the GA (10/20 compared to 7/20 for GA) and fewer iterations were required to find them.

Parity checker
The goal of this experiment was to reconstruct a parity checker (see Table 3) from the given input-output pairs.As input data we use seven pairs with length 8 of each input string.The number of states n equals two.The number of iterations is taken 20, and the number of objects equals five.
This experiment showed that the BGSA was more frequently able to find 100% solutions than the GA (14/20 compared to 10/20 for GA) and fewer iterations were required to find them.In three out of twenty cases the GA was not able to improve the maximal solution that was randomly generated in the initial population; for the GSA this happened only in one case out of twenty.

Experiments II
The goal of those experiments was to compare the BGSA and GA for the same random initial set of objects.Tasks were taken as in the previous experiments (i.e., 'pattern recognizer' and 'parity checker').Those experiments were constructed in such a way that the initial population was taken the same for both algorithms, the parameters such as the number of iterations and the number of objects were also equal for both algorithms, and are described in Subsection 5.1.Each algorithm (BGSA and GA) was executed 10 times with the same initial set of objects as in Experiments I (5.1).The average best-so-far solutions are presented in Fig. 5.According to the results of this experiment, in the case of 'pattern recognizer' the BGSA solves the task better than the GA (Fig. 5a).For the second task, 'parity checker' (Fig. 5b), the BGSA behaves almost like the GA.

CONCLUSIONS AND FUTURE WORK
In this paper we presented a method for the inference of Moore machines based on a gravitationally-inspired search algorithm.Binary representation of FSMs and different types of objective functions were introduced.Parameters and variations of the proposed algorithm were discussed.The proposed approach was implemented and successfully tested using random data and different examples.During the first experiments, our approach gave promising results.To improve the quality of the proposed approach, parameters of the algorithm and their effect on the presented methods will be explored.The effect of using different aspects of laws will be investigated.During further developments the proposed method will be adjusted to take into account other types of FSM, for example the Mealy machines.

Fig. 3 .
Fig. 3.A Moore machine represented as a transition diagram.