Output Signal Based Combination of Two Nlms Adaptive Filters – Transient Analysis

A combination of two complex normalized least mean square (NLMS) adaptive filters that adapt on the same input signal at the same time is investigated. One of the filters has a large and the other one has a small step size. The outputs of the filters are combined together through a mixing parameter λ. This combination is an interesting new way of achieving simultaneously a fast initial convergence and a small steady state error of an adaptive algorithm. The mixing parameter is computed from the output signals of the individual filters. The expressions characterizing the time evolution of the mean square deviation and the excess mean square error of the combination scheme are derived. The theoretical results are verified by simulations.


INTRODUCTION
When designing an adaptive algorithm, one faces a trade-off between the initial convergence speed and the mean-square error in steady state.In case of algorithms belonging to the least mean square (LMS) family this trade-off is controlled by the step-size parameter.A large step size leads to a fast initial convergence but the algorithm also exhibits a large mean-square error in the steady state, and on the contrary, a small step size slows down the convergence but results in a small steady state error [1,2].
Variable step size adaptive schemes offer a possible solution allowing achieving both fast initial convergence and low steady state misadjustment [3][4][5][6][7].How successful these schemes are depends on how well the algorithm is able to estimate the distance of the adaptive filter weights from the optimal solution.The variable step size algorithms use different criteria for calculating the proper step size at any given time instance.For example, squared instantaneous errors have been used in [4] and the squared autocorrelation of errors at adjacent time instances have been used in [6].Paper [5] investigates an algorithm that changes the time-varying convergence parameters in such a way that the change is proportional to the negative of the gradient of the squared estimation error with respect to the convergence parameter.In [7] the norm of the projected weight error vector is used as a criterion to determine how close the adaptive filter is to its optimum performance.
Recently there has been an interest in a combination scheme that is able to optimize the trade-off between the convergence speed and the steady state error [8].The scheme consists of two adaptive filters that are simultaneously applied to the same inputs as depicted in Figure 1.One of the filters has a large step size allowing fast convergence and the other one has a small step size for a small steady state error.The outputs of the filters are combined through a mixing parameter λ .The performance of this scheme has been studied for some parameter update schemes [9][10][11].Paper [9] uses convex combination, i.e., λ is constrained to lie between 0 and 1. Paper [10] presents a transient analysis of a slightly modified version of this scheme.The parameter λ is in those studies found by using an LMS-type adaptive scheme and computing the sigmoidal function of the result.Another approach by computing the mixing parameter using an affine combination is provided in [11].This paper uses the ratio of time averages of the instantaneous errors of the filters.The error function of the ratio is computed to obtain λ .In [12] a convex combination of two adaptive filters with different adaptation schemes has been investigated with the aim of improving the steady state characteristics.One of the adaptive filters uses the LMS algorithm and the other one the Generalized Normalized Gradient Decent algorithm.The combination parameter λ is computed, using stochastic gradient adaptation.In [13] the convex combination of two adaptive filters is applied in a variable filter length scheme to gain improvements in low signal to noise ratio conditions.In [14] the combination has been used to join two affine projection filters with different regularization parameters.Paper [15] uses the combination on parallel binary structured LMS algorithms.These three works use the LMS-like scheme of [16] to compute λ .
It should be noted that schemes involving two filters have been proposed earlier [17,18].However, in those early schemes only one of the filters was adaptive, while the other used fixed filter weights.The updating of the fixed filter was accomplished by copying all coefficients from the adaptive filter, when the adaptive filter was performing better than the fixed one.
In the present paper the mixing parameter λ is computed from output signals of the individual filters.The way of calculating the mixing parameter is optimal in the sense that it results from minimization of the mean-squared error of the combined filter.The scheme was independently proposed in [19] and [20]; the steady state performance of it was investigated in [21] and the tracking performance in [22].In [23] the output signal based combination was used in the adaptive line enhancer.Those papers investigate the behaviour of the adaptive combination scheme in steady state, i.e., in the situation when discrete time n approaches infinity.In the main body of this paper, on the other hand, a transient analysis of the algorithm is given, i.e., the formulae, predicting the entire course of adaptation, not only steady state, are derived.
It will be assumed throughout the paper that the signals are complex-valued and the combination scheme uses two normalized LMS (NLMS) adaptive filters.The italic, boldface lower-case, and boldface upper-case letters will be used for scalars, column vectors, and matrices, respectively.The superscript T denotes transposition of a matrix, the operator E[•] denotes mathematical expectation, and tr[•] stands for trace of a matrix.

ALGORITHM
Let us consider two adaptive filters, as shown in Figure 1, each of them updated using the NLMS adaptation rule In the above, w i (n) is the N vector of coefficients of the ith adaptive filter, with i = 1, 2. The vector w o is the true weight vector we aim to identify with our adaptive scheme and x(n) is the known N input vector, common for both adaptive filters.The input process is assumed to be a zero mean wide sense stationary Gaussian process.The desired signal d(n) is a sum of the output of the filter to be identified and the Gaussian, zero mean independent an identically distributed (i.i.d.) measurement noise that is statistically independent of all other signals.The step size of the ith adaptive filter is denoted by µ i .We assume without loss of generality that µ 1 > µ 2 .The case µ 1 = µ 2 is not interesting because the two filters remain equal and the combination renders to a single filter.
The outputs of the two adaptive filters are combined according to where y i (n) = w H i (n − 1)x(n) and the mixing parameter λ can be any real number.We define the a priori system error signal as difference between the output signal of the true system at time n, given by and the output signal of our adaptive scheme y(n) Let us now find λ (n) by minimizing the mean square of the a priori system error.The derivative of Setting the derivative to zero results in where we have replaced the true system output signal y o (n) by its observable noisy version d(n).Note, however, that because we have made the standard assumption that the input signal x(n) and measurement noise v(n) are independent random processes, this can be done without introducing any error into our calculations.
The denominator of equation ( 7) comprises expectation of the squared difference of the two filter output signals.This quantity can be very small or even zero, particularly at the beginning of adaptation if the two step sizes are close to each other.Correspondingly λ computed directly from ( 7) may be large.To avoid this from happening, we add a small regularization constant ε to the denominator of (7).The constant ε should be selected small compared to E[x T (n)x(n)] but large enough to prevent division by zero in given arithmetic.

TRANSIENT ANALYSIS
In this section we are interested in finding expressions that characterize transient performance of the combined algorithm, i.e., we intend to derive formulae that characterize the entire course of adaptation of the algorithm.Before we can proceed we need, however, to introduce some notations.First, let us denote the weight error vector of the ith filter as Then the equivalent weight error vector of the combined adaptive filter will be The mean square deviation (MSD) of the combined filter is given by The a priori estimation error of an individual filter is defined as It follows from ( 5) that we can express the a priori error of the combination as and because λ (n) is, according to (7), a ratio of mathematical expectations and, hence, deterministic, we have for the excess mean square error (EMSE) of the combination As e i,a (n) = wH i (n − 1)x(n), the expression of the EMSE becomes In what follows we often drop the explicit time index n as we have done in (14), if it is not necessary to avoid a confusion.Noting that y i (n) = w H i (n − 1)x(n), we can rewrite the expression for λ (n) in (7) as We thus need to investigate the evolution of terms of the type ] in order to reveal the time evolution of EMSE(n) and λ (n).To do so, we concentrate first on the mean square deviation defined in (10).
For a single NLMS filter we have after subtraction of (1) from w o and expressing e i (n) through the error of the corresponding Wiener filter e o (n) At this point we make two approximations.First, we approximate the outer product of input signal vectors by its correlation matrix xx H ≈ R x .Second, we approximate the inner product of the input signal vectors by N times of its power x H x ≈ Nσ 2 x .With those approximations we have This means, in fact, that we apply the small step size theory [2] even if the assumption of a small step size is not really true for the fast adapting filter.In our simulation study we will see, however, that the assumption works rather well in practice.
Let us now define the eigendecomposition of the correlation matrix as where Q is a unitary matrix whose columns are the orthogonal eigenvectors of R x and Ω Ω is a diagonal matrix having eigenvalues associated with the corresponding eigenvectors on its main diagonal.We also define the transformed weight error vector as and the transformed last term of equation (17) as Then we can rewrite equation ( 17) after multiplying both sides by Q H from the left as We note that the mean of p is zero by the orthogonality theorem and its correlation matrix equals We now invoke the Gaussian moment factoring theorem to write The first term in the above is zero due to the principle of orthogonality and the second term equals RJ min .
Hence we are left with where is the minimum mean square error produced by the corresponding Wiener filter.As the matrices I and Ω Ω in ( 21) are diagonal, it follows that the mth element of vector v(n) is given by where ω m is the mth eigenvalue of R x and v m and p m are the mth components of the vectors v and p, respectively.
We can now express the MSD and its individual components in (10) through the transformed weight error vectors as Let us concentrate on the mth component in the sum above corresponding to the cross term and denote it as The expressions for the component filters follow as special cases.Substituting (25) into the expression of ϒ m above, taking the mathematical expectation, and noting that the vector p is independent of v(0) results in We now note that most likely the two component filters are initialized to the same value and that We then have for the mth component of MSD The sum over i above can be recognized as a geometric series with n terms.The first term is equal to 1 and the geometric ratio equals 1 After substitution of the above into (30) and simplification we are left with which is our result for a single entry to the MSD cross term vector.It is easy to see that for the terms involving a single filter we get expressions that coincide with the one available in the literature [2].
Let us now focus on the cross term appearing in the EMSE equation (14).Due to the independence assumption we can rewrite this by using the properties of the trace operator as Let us now recall that for any of the filters wi The EMSE of the combined filter can now be computed as where the components of type E[v k,i (n − 1)v l,i (n − 1)] are given by (32).To compute λ (n), we use (15), substituting (35) for its individual components.

SIMULATION RESULTS
A simulation study was carried out with the aim of verifying the approximations made in Section 3. In particular we are interested in how well the small step-size theory applies to our combination scheme of two adaptive filters.
We have combined two 64 tap long adaptive filters.In order to obtain a practical algorithm, the expectation operators in both the numerator and denominator of (7) have been replaced by exponential averaging of the type where u(n) is the signal to be averaged, P u (n) is the averaged quantity, and γ = 0.01.The averaged quantities were then used in (7) to obtain λ .We have selected the sample echo path model number one from [24], to be the unknown system to identify.The impulse response of the echo path is shown in Figure 2.
The curves in Figures 3-6 are averaged over 100 trials.The input signal x is formed from the Gaussian white noise with unity variance by passing it through the filter with the transfer function to get a coloured input signal.The measurement noise is Gaussian white noise, statistically independent of x.The solid lines represent the simulation results and the dashed lines are the theoretical results.The theory predicts the simulation results rather well and because of that in Figs 3-6 the theoretical curve overlaps with the simulation result.As seen from Figure 3, there is a a rapid convergence in the beginning, followed by a stabilization period.When the slow adapting filter gets better than the fast one between sample times 15 000 and 20 000, a second convergence occurs.One can observe a good resemblance of simulation and theoretical curves.filter is used as the output of the combination.After a while, when the slow filter catches up the fast one and becomes better, λ changes towards zero and eventually becomes a small negative number.In this state the slow but more accurate filter determines the combined output.Again, one can see that there is a clear similarity between the lines.The reason for λ becoming negative at the end of adaptation lies in that we have two adaptive filters working in parallel on the same input signal and the criterion for selecting λ is minimization of the mean square error of the output signal.At the end of adaptation some of the additive noise v(n) will be cancelled out if λ is allowed to become negative to provide the smallest possible mean square error.
In Figure 5 we have made the difference between the step sizes small.One can see that the characteristic horizontal part of the learning curve has almost disappeared.We have also increased the measurement noise level.The simulation and theoretical curves show a good match.In Figure 6 we have increased the measurement noise level even more.One can see that the theoretical simulation results agree well.

CONCLUSIONS
We have investigated a combination of two NLMS adaptive filters that are simultaneously applied to the same input signals.The mixing parameter λ was computed, using the output signals of the individual filters and the desired signal.The transient behaviour of the algorithm was investigated, using the assumption of a small step size, and the expressions for evolution of EMSE(n) and λ (n) were derived.Finally, it was shown in the simulation study that the derived formulae fit the simulation results well.