{"id":145,"date":"2019-10-14T12:14:21","date_gmt":"2019-10-14T12:14:21","guid":{"rendered":"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/?page_id=145"},"modified":"2020-02-21T08:58:27","modified_gmt":"2020-02-21T08:58:27","slug":"neural-network","status":"publish","type":"page","link":"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/?page_id=145","title":{"rendered":"Neural Networks"},"content":{"rendered":"<p><strong>COMPARISON AMONG BACKPROPAGATION LEARNING METHODS<\/strong><\/p>\n<p><strong><span style=\"text-decoration: underline;\">Abstract:<\/span><\/strong> This paper analyzes and compares several improvements to the backpropagation method for weights adjustments for a feed-forward network. Therefore, the networks\u2019 behavior is simulated on six certain specific applications. It is also presented a method using a variable step and its superiority is proved by simulation.<br \/>\n1.<strong><span style=\"text-decoration: underline;\">Introduction <\/span><\/strong>Any feed-forward network has a layer-like structure. Each layer is made up of units which receive inputs from the immediate-preceding units and send their outputs to the very next following units.\u00a0<img loading=\"lazy\" class=\" wp-image alignright\" src=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig1aen.png\" alt=\"\" width=\"339\" height=\"221\" \/>The rule that gives the synaptic weights is called the <strong>backpropagation rule<\/strong>. Albeit it is possible, it is not necessary to apply this rule to more than one hidden units layer because it has been shown <strong>[3]<\/strong> that a single hidden units layer is enough to approximate by a certain precision any function with a finite number of discontinuities if the hidden units are being activated by non-linear functions. Most of the time, applications use a feed-forward network with one hidden units layer and the sigmoidal function to activate the units.For a typical 3-layer network(<strong>figure 1.a<\/strong>)\u00a0 the network state is given by the following equations:<\/p>\n<p><img title=\"y_{m}=\\mathfrak{F}(a_{_m});\\textrm{&amp;space;}a_{_m}=\\sum_{j=1}^{J}q_{mj}\\cdot v_{j}-s_{m},\\textrm{&amp;space;}m=\\overline{1,M}\\textrm{&amp;space;}(1)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?y_{m}=\\mathfrak{F}(a_{_m});\\textrm{&amp;space;}a_{_m}=\\sum_{j=1}^{J}q_{mj}\\cdot&amp;space;v_{j}-s_{m},\\textrm{&amp;space;}m=\\overline{1,M}\\textrm{&amp;space;}(1)\" alt=\"\" \/><img title=\"v_{j}=\\mathfrak{F}(\\overline{a_{j}});\\textrm{ }a_{j}=\\sum_{i=1}^{I}w_{ji}\\cdot x_{i}-c_{j},\\textrm{ }j=\\overline{1,J}\\textrm{ }(2)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?v_{j}=\\mathfrak{F}(\\overline{a_{j}});\\textrm{&amp;space;}a_{j}=\\sum_{i=1}^{I}w_{ji}\\cdot&amp;space;x_{i}-c_{j},\\textrm{&amp;space;}\\textrm{&amp;space;}j=\\overline{1,J}\\textrm{&amp;space;}\\textrm{&amp;space;}\\textrm{&amp;space;}\\textrm{&amp;space;}\\textrm{&amp;space;}\\textrm{&amp;space;}(2)\" alt=\"\" \/><\/p>\n<p>The synaptic weights (w<sub>ji<\/sub> and q<sub>mj<\/sub> ) and the bias potentials ( s<sub>m<\/sub> and c<sub>j<\/sub>) have to be selected so that the total error:<\/p>\n<p><img title=\"\\mathbf{J_{s}}(\\mathbf{ W,Q,s,c})=\\sum_{\\mu }^{ } J__{s}^{\\mu }\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\mathbf{J_{s}}(\\mathbf{&amp;space;W,Q,s,c})=\\sum_{\\mu&amp;space;}^{&amp;space;}&amp;space; \\mathbf{J_{s}^{\\mu&amp;space;}}\\textrm{&amp;space;}\\textrm{&amp;space;}\\textrm{&amp;space;}\\textrm{&amp;space;}\\textrm{&amp;space;}(3)\" alt=\"\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Is as small as possible.<img title=\"J_{s}^{\\mu }\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;J_{s}^{\\mu&amp;space;}\" alt=\"\" \/> is the square error at the output of pattern \u03bc:<br \/>\n<img title=\"\\mathbf{J_{s}^{\\mu}}=\\frac{1}{2}\\cdot \\sum_{m=1}^{M}\\left [ d_{m}^{\\mu }-\\mathfrak{F(a_{m}^{\\mu })} \\right ]^{2} (4)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\mathbf{J_{s}^{\\mu}}=\\frac{1}{2}\\cdot&amp;space;\\sum_{m=1}^{M}\\left&amp;space;[&amp;space;d_{m}^{\\mu&amp;space;}-\\mathfrak{F(a_{m}^{\\mu&amp;space;})}&amp;space;\\right&amp;space;]^{2}\\textrm{&amp;space;}(4)\" alt=\"\" \/><\/p>\n<p>where d<sup>\u03bc<\/sup> is the desired output array for class \u03bc. It can be seen that the back-propagation rule is a generalization of the delta rule. Accordingly, we have to calculate the gradient of J, in relation to each parameter and then to modify proportional with this gradient the values of the corresponding parameter. In the first place, we consider only the synaptic connections of the output neurons:<br \/>\n<img loading=\"lazy\" class=\"alignnone\" title=\" \\Delta q_{mj}= -\\rho \\cdot \\frac{\\partial J_{s}}{\\partial q_{mj}}=-\\rho \\cdot \\sum_{\\mu}^{ }\\frac{\\partial J_{s}^{\\mu }}{\\partial q_{mj}}\\cdot \\frac{\\partial a_{m}^{\\mu }}{\\partial q_{mj}} =\\rho \\cdot \\sum_{\\mu}^{ }[d_{m}^{\\mu }-\\mathfrak{F(a_{m}^{\\mu })}]\\cdot \\mathfrak{F(a_{m}^{\\mu })}\\cdot \\frac{\\partial a_{m}^{\\mu }}{\\partial q_{mj}}\\equiv \\rho \\cdot \\sum_{\\mu}^{ }\\Delta _{m}^{\\mu \\cdot V_{j}^{\\mu }} (5))\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\Delta&amp;space;q_{mj}=&amp;space;-\\rho&amp;space;\\cdot&amp;space;\\frac{\\partial&amp;space;J_{s}}{\\partial&amp;space;q_{mj}}=-\\rho&amp;space;\\cdot&amp;space;\\sum_{\\mu}^{&amp;space;}\\frac{\\partial&amp;space;J_{s}^{\\mu&amp;space;}}{\\partial&amp;space;q_{mj}}\\cdot&amp;space;\\frac{\\partial&amp;space;a_{m}^{\\mu&amp;space;}}{\\partial&amp;space;q_{mj}}&amp;space;=\\rho&amp;space;\\cdot&amp;space;\\sum_{\\mu}^{&amp;space;}[d_{m}^{\\mu&amp;space;}-\\mathfrak{F(a_{m}^{\\mu&amp;space;})}]\\cdot&amp;space;\\mathfrak{F(a_{m}^{\\mu&amp;space;})}\\cdot&amp;space;\\frac{\\partial&amp;space;a_{m}^{\\mu&amp;space;}}{\\partial&amp;space;q_{mj}}\\equiv&amp;space;\\rho&amp;space;\\cdot&amp;space;\\sum_{\\mu}^{&amp;space;}\\Delta&amp;space;_{m}^{\\mu&amp;space;\\cdot&amp;space;}V_{j}^{\\mu&amp;space;}&amp;space;(5)\" alt=\"\" width=\"593\" height=\"121\" \/><\/p>\n<p>In the next step, we consider the parameters associated to the synaptic connections between the input layer and the hidden one. The procedure is alike, only it needs two substitutions:<img loading=\"lazy\" class=\"alignnone\" title=\" \\Delta S_{m}= -\\rho \\cdot \\frac{\\partial J_{s}}{\\partial J_{m}}=\\rho \\cdot \\sum_{\\mu}^{ }[d_{m}^{\\mu }-\\mathfrak{F(a_{m}^{\\mu })}]\\cdot \\mathfrak{F(a_{m}^{\\mu })}\\cdot \\frac{\\partial a_{m}^{\\mu }}{\\partial S_{m}}=- \\rho \\cdot \\sum_{\\mu}^{ }\\Delta _{m}^{\\mu }= \\rho \\cdot \\sum_{\\mu}^{ }\\Delta _{m}^{\\mu }\\cdot (-1)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\Delta&amp;space;S_{m}=&amp;space;-\\rho&amp;space;\\cdot&amp;space;\\frac{\\partial&amp;space;J_{s}}{\\partial&amp;space;J_{m}}=\\rho&amp;space;\\cdot&amp;space;\\sum_{\\mu}^{&amp;space;}[d_{m}^{\\mu&amp;space;}-\\mathfrak{F(a_{m}^{\\mu&amp;space;})}]\\cdot&amp;space;\\mathfrak{F(a_{m}^{\\mu&amp;space;})}\\cdot&amp;space;\\frac{\\partial&amp;space;a_{m}^{\\mu&amp;space;}}{\\partial&amp;space;s_{m}}=-&amp;space;\\rho&amp;space;\\cdot&amp;space;\\sum_{\\mu}^{&amp;space;}\\Delta&amp;space;_{m}^{\\mu&amp;space;}=&amp;space;\\rho&amp;space;\\cdot&amp;space;\\sum_{\\mu}^{&amp;space;}\\Delta&amp;space;_{m}^{\\mu&amp;space;}\\cdot&amp;space;(-1) (6)\" alt=\"\" width=\"592\" height=\"121\" \/><img title=\"\\small \\Delta _{m}^{\\mu }=[d_{m}^{\\mu }-\\mathfrak{F(a_{m}^{\\mu })}]\\cdot \\mathfrak{F(a_{m}^{\\mu })} (7)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;\\small&amp;space;\\Delta&amp;space;_{m}^{\\mu&amp;space;}=[d_{m}^{\\mu&amp;space;}-\\mathfrak{F(a_{m}^{\\mu&amp;space;})}]\\cdot&amp;space;\\mathfrak{F(a_{m}^{\\mu&amp;space;})}&amp;space;(7)\" alt=\"\" \/><img title=\"\\Delta w_{ji}=-\\rho \\cdot \\frac{\\partial J_{s}}{\\partial w_{ji}}=-\\rho \\cdot \\sum_{\\mu }^{ }\\sum_{m=1}^{M}\\frac{\\partial J_{s}^{\\mu }}{\\partial a_{m}^{\\mu }}\\cdot \\frac{{\\partial a_{m}^{\\mu }}}{\\partial v_{j}}\\cdot \\frac{\\partial v_{j}}{\\partial w_{ji}}=\\rho \\cdot \\sum_{\\mu }^{ }\\sum_{m=1}^{M}[d_{m}^{\\mu }-\\mathfrak{F(a_{m}^{\\mu })}]\\cdot \\mathfrak{F(a_{m}^{\\mu })}\\cdot\\frac{{\\partial a_{m}^{\\mu }}}{\\partial v_{j}}\\cdot \\frac{\\partial v_{j}}{\\partial w_{ji}}=\\rho \\cdot \\sum_{\\mu }^{ }\\sum_{m=1}^{M}\\Delta _{m}^{\\mu }\\cdot q_{mj}\\cdot \\mathfrak{F}(\\overline{a_{J}^{\\mu }})\\cdot \\frac{\\partial\\overline{ a_{J}}}{\\partial w_{ji}}\\equiv \\rho \\cdot \\sum_{\\mu }^{ } \\overline{\\Delta _{J}^{\\mu }}\\cdot x_{i}^{\\mu } (8)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;\\Delta&amp;space;w_{ji}=-\\rho&amp;space;\\cdot&amp;space;\\frac{\\partial&amp;space;J_{s}}{\\partial&amp;space;w_{ji}}=-\\rho&amp;space;\\cdot&amp;space;\\sum_{\\mu&amp;space;}^{&amp;space;}\\sum_{m=1}^{M}\\frac{\\partial&amp;space;J_{s}^{\\mu&amp;space;}}{\\partial&amp;space;a_{m}^{\\mu&amp;space;}}\\cdot&amp;space;\\frac{{\\partial&amp;space;a_{m}^{\\mu&amp;space;}}}{\\partial&amp;space;v_{j}}\\cdot&amp;space;\\frac{\\partial&amp;space;v_{j}}{\\partial&amp;space;w_{ji}}=\\rho&amp;space;\\cdot&amp;space;\\sum_{\\mu&amp;space;}^{&amp;space;}\\sum_{m=1}^{M}[d_{m}^{\\mu&amp;space;}-\\mathfrak{F(a_{m}^{\\mu&amp;space;})}]\\cdot&amp;space;\\mathfrak{F(a_{m}^{\\mu&amp;space;})}\\cdot\\frac{{\\partial&amp;space;a_{m}^{\\mu&amp;space;}}}{\\partial&amp;space;v_{j}}\\cdot&amp;space;\\frac{\\partial&amp;space;v_{j}}{\\partial&amp;space;w_{ji}}=\\rho&amp;space;\\cdot&amp;space;\\sum_{\\mu&amp;space;}^{&amp;space;}\\sum_{m=1}^{M}\\Delta&amp;space;_{m}^{\\mu&amp;space;}\\cdot&amp;space;q_{mj}\\cdot&amp;space;\\mathfrak{F}(\\overline{a_{J}^{\\mu&amp;space;}})\\cdot&amp;space;\\frac{\\partial\\overline{&amp;space;a_{J}}}{\\partial&amp;space;w_{ji}}\\equiv&amp;space;\\rho&amp;space;\\cdot&amp;space;\\sum_{\\mu&amp;space;}^{&amp;space;}&amp;space;\\overline{\\Delta&amp;space;_{J}^{\\mu&amp;space;}}\\cdot&amp;space;x_{i}^{\\mu&amp;space;}&amp;space;(8)\" alt=\"\" \/><img title=\"\\Delta c_{j}=-\\rho \\cdot \\frac{\\partial J_{s} }{\\partial c_{j}}\\equiv -\\rho\\cdot \\sum_{\\mu }^{ }\\overline{\\Delta _{J}^{\\mu }}= \\rho\\cdot \\sum_{\\mu }^{ }\\overline{\\Delta _{J}^{\\mu }}\\cdot (-1) (9)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;\\Delta&amp;space;c_{j}=-\\rho&amp;space;\\cdot&amp;space;\\frac{\\partial&amp;space;J_{s}&amp;space;}{\\partial&amp;space;c_{j}}\\equiv&amp;space;-\\rho\\cdot&amp;space;\\sum_{\\mu&amp;space;}^{&amp;space;}\\overline{\\Delta&amp;space;_{J}^{\\mu&amp;space;}}=&amp;space;\\rho\\cdot&amp;space;\\sum_{\\mu&amp;space;}^{&amp;space;}\\overline{\\Delta&amp;space;_{J}^{\\mu&amp;space;}}\\cdot&amp;space;(-1)&amp;space;(9)\" alt=\"\" \/><\/p>\n<p>where:<\/p>\n<p><img title=\"\\overline{\\Delta _{J}^{\\mu }}=\\left [ \\sum_{m}^{ }\\Delta _{m}^{\\mu }\\cdot q_{mj} \\right ]\\cdot \\mathfrak{F(\\overline{a_{j}^{\\mu }})}(10)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;\\overline{\\Delta&amp;space;_{J}^{\\mu&amp;space;}}=\\left&amp;space;[&amp;space;\\sum_{m}^{&amp;space;}\\Delta&amp;space;_{m}^{\\mu&amp;space;}\\cdot&amp;space;q_{mj}&amp;space;\\right&amp;space;]\\cdot&amp;space;\\mathfrak{F(\\overline{a_{j}^{\\mu&amp;space;}})}(10)\" alt=\"\" \/><\/p>\n<p>It is notable that the synaptic weights adjustment equations (8) and (9) resemble to the synaptic equations<strong> (5)<\/strong> and <strong>(6)<\/strong>, with the difference between <img title=\"\\overline{\\Delta _{J}^{\\mu }}\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;\\overline{\\Delta&amp;space;_{J}^{\\mu&amp;space;}}\" alt=\"\" \/> difer\u0103 de\u00a0<img title=\"\\Delta _{m}^{\\mu }\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;\\Delta&amp;space;_{m}^{\\mu&amp;space;}\" alt=\"\" \/> , where from it can be recursively obtained.<\/p>\n<p>The backpropagations method has been also extended to the recurrent networks <strong>[11, 12, 13]<\/strong>, which became more popular ever since.<\/p>\n<p>The error correction scheme works as if the data referring to the deviation from the desired output would propagate backward through the network, \u201cagainst the flow\u201d of the synaptic connections. It is doubtable, although not entirely impossible, that such a procedure could be accomplished by the biological neural networks. What\u2019s for certain, is that the backward error propagation algorithm is most appropriate for the electronic computers, in both hardware and software implementations. Recently, a backpropagation-like rule, which is based not on the total mean square error minimization but on the Kullbach data maximization, has been out forward for consideration <strong>[2]<\/strong>.<br \/>\n<img loading=\"lazy\" class=\"aligncenter wp-image size-large\" src=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig1ben.png\" alt=\"\" width=\"640\" height=\"287\" \/>The architecture shown in <strong>figure 1.a.<\/strong>, as good as the relationships <strong>(5) \u2013 (10)<\/strong>, can be extended to a R-layers feed-forward network (R-2 hidden units layers), resulting the architecture shown in figure 1.b. The error criterion that has to be minimized is still the square mean error computed for all the training examples set:<br \/>\n<img title=\"J=\\sum_{\\mu }{ }\\sum_{j_{1}}^{H_{1}}(d_{j_{1}}^{\\mu }-v_{j_{1}}^{\\mu })^{2} (11)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;J=\\sum_{\\mu&amp;space;}{&amp;space;}\\sum_{j_{1}}^{H_{1}}(d_{j_{1}}^{\\mu&amp;space;}-v_{j_{1}}^{\\mu&amp;space;})^{2}&amp;space;(11)\" alt=\"\" \/><\/p>\n<p>The network\u2019s output will be just the output of the last neural layer:<img title=\"v _{j_{c}}^{\\mu }=\\mathfrak{F}\\left ( \\sum_{j_{c+1}=1}^{H_{c+1}} w_{j_{c},j_{c+1}}\\cdot v _{j_{c+1}}^{\\mu }-s_{j_{c}} \\right )(12)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;v&amp;space;_{j_{c}}^{\\mu&amp;space;}=\\mathfrak{F}\\left&amp;space;(&amp;space;\\sum_{j_{c+1}=1}^{H_{c+1}}&amp;space;w_{j_{c},j_{c+1}}\\cdot&amp;space;v&amp;space;_{j_{c+1}}^{\\mu&amp;space;}-s_{j_{c}}&amp;space;\\right&amp;space;)(12)\" alt=\"\" \/><\/p>\n<p>Using the backpropagation rule, the following relationships will be generated:<\/p>\n<p><img title=\"\\Delta w_{j_{c},j_{c+1}}=\\sum_{\\mu }^{ }\\Delta_{j_{c}}^{\\mu } \\cdot v _{j_{c+1}}^{\\mu }(13)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;\\Delta&amp;space;w_{j_{c},j_{c+1}}=\\sum_{\\mu&amp;space;}^{&amp;space;}\\Delta_{j_{c}}^{\\mu&amp;space;}&amp;space;\\cdot&amp;space;\\nu&amp;space;_{j_{c+1}}^{\\mu&amp;space;}(13)\" alt=\"\" \/><\/p>\n<p><img title=\"\\Delta _{j_{c}}^{\\mu }=\\left ( \\sum_{j_{c-1}=1}^{H_{c-1}}\\Delta _{j_{c-1}}\\cdot w_{j_{c-1}j_{c}}\\right )\\cdot \\mathfrak{F}\\left ( v_{j_{c}}^{\\mu } \\right )(14)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;\\Delta&amp;space;_{j_{c}}^{\\mu&amp;space;}=\\left&amp;space;(&amp;space;\\sum_{j_{c-1}=1}^{H_{c-1}}\\Delta&amp;space;_{j_{c-1}}\\cdot&amp;space;w_{j_{c-1},j_{c}}\\right&amp;space;)\\cdot&amp;space;\\mathfrak{F}\\left&amp;space;(&amp;space;\\nu_{j_{c}}^{\\mu&amp;space;}&amp;space;\\right&amp;space;)(14)\" alt=\"\" \/><\/p>\n<p><img title=\"\\Delta _{j_{1}}^{\\mu }= \\sum_{j_{1}=1}^{H_{1}} \\left ( d_{j_{1}}^{\\mu }-\\nu_{j_{1}}^{\\mu } \\right )\\cdot \\mathfrak{F}\\left ( v_{j_{1}}^{\\mu } \\right )(15)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;\\Delta&amp;space;_{j_{1}}^{\\mu&amp;space;}=&amp;space;\\sum_{j_{1}=1}^{H_{1}}&amp;space;\\left&amp;space;(&amp;space;d_{j_{1}}^{\\mu&amp;space;}-\\nu_{j_{1}}^{\\mu&amp;space;}&amp;space;\\right&amp;space;)\\cdot&amp;space;\\mathfrak{F}\\left&amp;space;(&amp;space;\\nu_{j_{1}}^{\\mu&amp;space;}&amp;space;\\right&amp;space;)(15)\" alt=\"\" \/><\/p>\n<p><img title=\"\\Delta s_{j_{c}}=\\sum_{\\mu }^{ }\\Delta_{j_{c}}^{\\mu } \\cdot (-1)(16)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;\\Delta&amp;space;S_{j_{c}}=\\sum_{\\mu&amp;space;}^{&amp;space;}\\Delta_{j_{c}}^{\\mu&amp;space;}&amp;space;\\cdot&amp;space;(-1)(16)\" alt=\"\" \/><\/p>\n<p>Being a method of square mean error minimization, covering the whole training examples\u2019 set, method based on the negative gradient, the backpropagation method suffers from the general deficiencies of the gradient techniques. These deficiencies are, generally two:<\/p>\n<ul>\n<li>The learning rate, \u03c1, which gives the step value on the gradient direction, has to small enough. This, because the gradient is a local measure and if \u03c1 is too large, it may happen that the error value J will not decrease but oscillate, and the synaptic coefficients\u2019 array will jump from one edge if the \u201chole\u201d, where the local minimum is to be found, to the other edge. On the other hand, a too small \u03c1 leads to a too reduced convergence velocity which leads to a too slow learning process. Even if it is possible to determine (with a huge waste of time) a favorable \u03c1, this optimum modifies itself during the learning process; it also differs from problem to problem.<\/li>\n<li>Unimportant how rapidly the minimum is reached, this will be the nearest local minimum and it will be unable to exit because, being a gradient method, the backpropagation method exploits the local properties of the criterion function. Moreover, it seems that, the larger the searching space (the weights space) dimension is (i.e. we have a larger amount of neurons in the hidden layers), the larger the number of local minimums and, consequently, the \u201cchance\u201d of failing in one of these.<\/li>\n<\/ul>\n<p>Having insight these perspectives, we shall compare the classic backpropagation method to some improvements mentioned in the literature and, finally we are going to propose a method which converges more quickly and which is able to escape out of the local minimum.<br \/>\nTherefore, we shall simulate the behavior of those networks in solving six typical problems.<\/p>\n<p><strong style=\"font-size: 15px;\"><span style=\"text-decoration: underline;\">2.Problems for which the BP methods had been tested<\/span><\/strong><\/p>\n<p>For each problem it will be specified the number of inputs, outputs and hidden units. In finding out the number of hidden units contained by the hidden layer, it counts the results of other simulations<strong>[17]<\/strong>, where it is established the fact that, in order to solve the classification problems, it is proper that this number should be closely equal with the arithmetic mean from the number of the input neurons and the number of the output ones, but no less than 10.<br \/>\n2.1.\u00a0 <span style=\"text-decoration: underline;\">BINAR<\/span><\/p>\n<p>This problem consists in determining the parity of a binary 8-bit word (EXCLUSIVE-OR Task)<strong> [6, 9]<\/strong>.<br \/>\nThe network being used will possess therefore 8 input units and one output unit which will be 0 for an even number of \u201c1\u201d bits contained in the word and 1 otherwise. The number of hidden units is 10.<br \/>\nIn this problem we chose the network training over the entire set of possible examples, i.e. over 256 examples.<\/p>\n<p>2.2.\u00a0 <span style=\"text-decoration: underline;\">COUNTER<\/span><\/p>\n<p>We intended to design a counter, which counts the number of \u201c1\u201d bits in a binary 4-bit word <strong>[6, 9]<\/strong>. So, we shall have 4 input neurons and 5 output neurons, and the output array belonging to the set {(1,0,0,0,0)<sup>T<\/sup> , (0,1,0,0,0)<sup>T<\/sup>, (0,0,1,0,0)<sup>T<\/sup>, (0,0,0,1,0)<sup>T<\/sup>, (0,0,0,0,1)<sup>T<\/sup>} will be equal with the array i (i=1,&#8230;,5)if the input array has i-1 \u201c1\u201d bits. The chosen number of hidden units is 10. We trained the networks for all possible training examples; accordingly, this number of possible examples is 2<sup>4<\/sup>=16.<br \/>\n2.3.\u00a0 <span style=\"text-decoration: underline;\">MULTIPLEXOR<\/span><\/p>\n<p>The neural network that will simulate the multiplexor will have to learn to assign to the 3-bit input array (3 input neurons) (b<sub>1<\/sub>, b<sub>2<\/sub>, b<sub>3<\/sub>)<sup>T<\/sup> at the output , the 8-bit array(0,..0,1,0,..,0) with only one \u201c1\u201d bit at position i= b<sub>1<\/sub>\u20222<sup>2<\/sup>+b<sub>2<\/sub>\u20222+b<sub>3<\/sub>. The training examples number is 2<sup>3<\/sup>=8 <strong>[6]<\/strong>, and the number of hidden units is, obviously, also equal with 10.<\/p>\n<p>2.4.\u00a0 <span style=\"text-decoration: underline;\">5x<\/span><span style=\"text-decoration: underline;\">5\u00a0 TABLE<\/span><\/p>\n<p>This problem is the problem pertaining to recognizing (classifying) rows and columns into a binary figure. The image is represented by 5&#215;5 (5 rows and 5 columns, i.e. 25 input units, accordingly 25 of input neurons). Due to the fact that each image may contain the same number of rows and columns, we shall need 2 output neurons; the number of neurons from the hidden layer was chosen equal with 14.<br \/>\nConsidering the image given by the matrix C [n,m], (n,m=1,\u2026,5), then the input arrays x[i] will be formed by taking i=3\u2219(n-1)+m. We chose the learning simulation for a set of 16 examples, for each of them giving, as in every supervised leaning process, also the desired output array d:<\/p>\n<p style=\"text-align: center;\">x\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 d<br \/>\n<span style=\"font-family: 'times new roman', times, serif;\">11111.00000.00000.00000.00000\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a01 0<\/span><br \/>\n<span style=\"font-family: 'times new roman', times, serif;\"> 00000.11111.00000.00000.00000\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a01 0<\/span><br \/>\n<span style=\"font-family: 'times new roman', times, serif;\"> 00000.00000.11111.00000.00000\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a01 0<\/span><br \/>\n<span style=\"font-family: 'times new roman', times, serif;\"> 00000.00000.00000.11111.00000\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 1 0<\/span><br \/>\n<span style=\"font-family: 'times new roman', times, serif;\"> 00000.00000.00000.00000.11111\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a01 0<\/span><br \/>\n<span style=\"font-family: 'times new roman', times, serif;\"> 10000.10000.10000.10000.10000\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a00 1<\/span><br \/>\n<span style=\"font-family: 'times new roman', times, serif;\"> 01000.01000.01000.01000.01000\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a00 1<\/span><br \/>\n<span style=\"font-family: 'times new roman', times, serif;\"> 00100.00100.00100.00100.00100\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a00 1<\/span><br \/>\n<span style=\"font-family: 'times new roman', times, serif;\"> 00010.00010.00010.00010.00010\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a00 1<\/span><br \/>\n<span style=\"font-family: 'times new roman', times, serif;\"> 00001.00001.00001.00001.00001\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a00 1<\/span><br \/>\n<span style=\"font-family: 'times new roman', times, serif;\"> 11111.11111.00000.00000.00000\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 1 0<\/span><br \/>\n<span style=\"font-family: 'times new roman', times, serif;\"> 00000.00000.00000.11111.11111\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 1 0<\/span><br \/>\n<span style=\"font-family: 'times new roman', times, serif;\"> 11000.11000.11000.11000.11000\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 0 1<\/span><br \/>\n<span style=\"font-family: 'times new roman', times, serif;\"> 00011.00011.00011.00011.00011\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 0 1<\/span><br \/>\n<span style=\"font-family: 'times new roman', times, serif;\"> 10000.01000.00100.00010.00001\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 1\/2 1\/2<\/span><br \/>\n<span style=\"font-family: 'times new roman', times, serif;\"> 00001.00010.00100.01000.10000\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 1\/2 1\/2<\/span><\/p>\n<p>2.5. <span style=\"text-decoration: underline;\">ASSOCIATIVE MEMORY<\/span><\/p>\n<p>It is tested the \u201cfeature extraction\u201d capability of neural networks, i.e. the possibility of memorizing the input \u201cshapes\u201d in a smaller dimension array. Therefore, the network is trained so that to a 16-bit input array will be assigned at the output the same array (also 16-bit long) <strong>[9, 10]<\/strong>. The difficulty consists of the fact that the hidden layer has only 4 neurons. We trained the networks over a set of 16 examples: the arrays with the i-th component equal with 1 (i=1,&#8230;,16) and the remaining components equal with 0.<\/p>\n<p>2.6.\u00a0 <span style=\"text-decoration: underline;\">FUNCTION<\/span><\/p>\n<p>Beyond the recurrent networks (able to represent time series) in process emulation are also used the feedforward networks, connected as shown in the architecture in figure 2.a.<strong> [14,15]<\/strong>.<\/p>\n<p><img loading=\"lazy\" class=\"alignnone wp-image size-full\" src=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/figura2aen.png\" alt=\"\" width=\"1005\" height=\"593\" \/><\/p>\n<p>Given p and q, a process can be emulated with a neural network with m-p+q+1 inputs and one output. Writing the representation given by the emulator as \u03c6<sub>E<\/sub>( \u2022) and its output with y\u2019 we have:<\/p>\n<p><img title=\"y'=\\varphi_{E} (X_{E})(17)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;y'=\\varphi_{E}&amp;space;(X_{E})(17)\" alt=\"\" \/><\/p>\n<p>where:<\/p>\n<p><img title=\"x_{E}(k)=[y(k),y(k-1),...,y(k-p+1),u(k),u(k-1),...,u(k-q)]^{T}(18)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;x_{E}(k)=[y(k),y(k-1),...,y(k-p+1),u(k),u(k-1),...,u(k-q)]^{T}(18)\" alt=\"\" \/><\/p>\n<p>The emulator is trained to minimize the emulating error value: y(k+1)-y\u2019. in figure 2.a., z^(-1) means the delay operator.<br \/>\nFor testing the learning methods, we chose a 2-grade operator. Deeply non-linear, which has been used in [5] and which is being taken from <strong>[8]<\/strong>. The system is described by the equation:<\/p>\n<p><img title=\"y(k)=\\frac{y(k-1)\\cdot y(k-2)\\cdot [y(k-1)+2,5]}{1+y^{2}(k-1)+y^{2}(k-2)}+u(k)(19)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;y(k)=\\frac{y(k-1)\\cdot&amp;space;y(k-2)\\cdot&amp;space;[y(k-1)+2,5]}{1+y^{2}(k-1)+y^{2}(k-2)}+u(k)(19)\" alt=\"\" \/><\/p>\n<p>where u(k) is a sinusoidal signal:<\/p>\n<p><img title=\"u(k)=sin(0,08\\cdot \\pi \\cdot k)(20)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;u(k)=sin(0,08\\cdot&amp;space;\\pi&amp;space;\\cdot&amp;space;k)(20)\" alt=\"\" \/><\/p>\n<p>We generated 80 pairs (u(k), y(k)) starting from y(-1)=0 and y(0)=1 and we used them for the <strong>off-line<\/strong> networks training.<br \/>\nAs the general procedure stipulates, we presumed to be known the SISO process rank (Signal Input Single Output) and, thereby, we chose a network with 3 input units, an output unit and 10 hidden units. The input units are connected as shown in <strong>figure 2.b.<\/strong> in the following way: at moment k, the three x units will be jointed one of each to y(k-2), y(k-1) and u(k), while the desired output, d, will be considered y(k). Actually, since the output of any neuron is always between 0 and 1, we have to proceed to a conversion of y\u2019s value in this interval. We chose a linear transforming method of the definition domain for y in the interval (0.15, 0.85) to allow the network to adapt also to signals which beyond the learning period would quit the training values\u2019 boundaries. Hence, we compute the mean Medy for y\u2019s values over the training set and then, MaxDy which is the absolute value of the values\u2019 deviation from the computed mean, which leads to the following:<\/p>\n<p><img title=\"d(k)=\\frac{1}{2}+0,7\\cdot \\frac{|y(k)-Medy|}{MaxDy}(21)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;d(k)=\\frac{1}{2}+0,7\\cdot&amp;space;\\frac{|y(k)-Medy|}{MaxDy}(21)\" alt=\"\" \/><br \/>\n<img loading=\"lazy\" class=\"alignnone wp-image size-full\" src=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/figura2ben.png\" alt=\"\" width=\"891\" height=\"500\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Different from other authors, we don\u2019t follow a likewise operate for inputs: such an operation would probably soften the adapting process, but it would complicate the training architecture because it would have to distinguish between input types (which yield from the process\u2019s inputs and which from its outputs).<br \/>\n3. <span style=\"text-decoration: underline;\"><strong>LEARNING METHODS<\/strong><\/span><br \/>\n3.1. <span style=\"text-decoration: underline;\">BackPropagation Classic<strong> (BP)<\/strong><\/span><br \/>\nThe classic method allows the user to choose the learning rate, i.e. the step value towards the criterion function gradient J. as shown in paragraph 1, this value shouldn\u2019t be too large nor to small. Up until now, there hasn\u2019t been brought up any general method for choosing <strong>\u03c1<\/strong> for a given problem. It is recommended that \u03c1 should be smaller than 1 or, eventually decreasing with the growth of the iteration number.<br \/>\nIt has been shown that the decreasing velocity of the criterion function strongly depends on the chosen value for \u03c1. Generally speaking, it is recommended to test the network evolution for a several values of \u03c1 and then to pick up a suitable value. One could start by taking a relative small values for \u03c1 and then increasing this value, as long as this decreases J. When \u03c1 exceeds the optimum and becomes too large, oscillations of J\u2019s values will appear, which sometimes grow larger instead if diminishing.<br \/>\nWe tested the network behavior for several <strong>\u03c1<\/strong> values, choosing the optimum.<br \/>\nHowever, it is not only \u03c1 that is unknown, as it may seem at first sight, but also the initial settings for the network weights. These are being randomly chosen within [-1,1]. Lastly, it would be wrong to choose \u03c1 for a certain set of weights. That is why we simulate each network behavior, for each \u03c1, for 10 problems, whatever the learning method and the variable parameters are, we consider each time<strong> the same<\/strong> 10 sets if initial random weights.<br \/>\nEach method was run the same amount of time. Because the iterations do not last equally, even for the same method, we also represented on the abscissa on <strong>figure 3<\/strong> the running time (2 minutes on a PC-IBM 486; 50 MHz; but this is unimportant fir the methods comparison).<br \/>\nOn the ordinate, we represented the decimal logarithm of the criterion function J for the whole examples set. We preferred the logarithmic representation for the reason that errors each anyway much smaller values than the initial ones, but their ranking at different moments of time counts.<br \/>\nEach line containing\u00a03 graphics presents, briefly, the simulation\u2019s results for a certain problem. The left graphic presents<img title=\"lg \\underset{i=1,10}{\\underline{max}}J_{i}(t;\\rho )\" src=\"http:\/\/latex.codecogs.com\/gif.latex?lg&amp;space;\\underset{i=1,10}{\\underline{max}}J_{i}(t;\\rho&amp;space;)\" alt=\"\" \/>,where i is the i-th set of initial weights for a given \u03c1 at any moment t of time. The right graphic is presents the\u00a0<img title=\"lg \\underset{i=1,10}{\\underline{max}}J_{i}(t;\\rho )\" src=\"http:\/\/latex.codecogs.com\/gif.latex?lg&amp;space;\\underset{i=1,10}{\\underline{min}}J_{i}(t;\\rho&amp;space;)\" alt=\"\" \/>functions, and the middle one is\u00a0\u00a0<img title=\"\\lg \\left [ \\frac{1}{10}\\cdot \\sum_{i=1}^{10}J_{i}(t;\\rho ) \\right ]\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\lg&amp;space;\\left&amp;space;[&amp;space;\\frac{1}{10}\\cdot&amp;space;\\sum_{i=1}^{10}J_{i}(t;\\rho&amp;space;)&amp;space;\\right&amp;space;]\" alt=\"\" \/> ,\u00a0\u00a0hence it refers to the criterion functions mean for a certain \u03c1.<\/p>\n<p>In order to select the most favorable value for \u03c1 we compare the logarithm means curves (the middle graphic), and in the case of several values with close results we will also consider the other graphics. For the reason of graphic unambiguity, we haven\u2019t sketched the curves for all values of \u03c1 for which we simulated the network behavior.<br \/>\nThe following simulations had been made:<br \/>\n1).<strong>BINAR:<\/strong> \u03c1=0.1 (curves 1); \u03c1=0.18; \u03c1=0.26 (curves 2); \u03c1=0.3; \u03c1=0.34 (curves 3);<br \/>\n\u03c1=0.38; \u03c1=0.42 (curves 4); \u03c1=0.5.<br \/>\nThe best result: \u03c1=0.34 (curves 3).<br \/>\n2).<strong>COUNTER:<\/strong> \u03c1=0.1; \u03c1=0.18 (curves 1); \u03c1=0.26; \u03c1=0.34 (curves 2); \u03c1=0.42; \u03c1=0.5 (curves 3); \u03c1=0.58; \u03c1=0.66 (curves 4).<br \/>\nThe best result: \u03c1=0.5 (curves 3).<br \/>\n3).<strong>MULTIPLEXOR:<\/strong> \u03c1=0.1; \u03c1=0.18; \u03c1=0.26; \u03c1=0.34; \u03c1=0.42 (curves 1); \u03c1=0.5; \u03c1=0.58; \u03c1=0.66 (curves 2); \u03c1=0.74; \u03c1=0.82; \u03c1=0.9 (curves 3); \u03c1=0,98; \u03c1=1.06; \u03c1=1.14; \u03c1=1.22 (curves 4); \u03c1=1.3.<br \/>\nThe best result: \u03c1=1.22 (curves 4).<br \/>\n4).<strong>5\u00d75 TABLE:<\/strong> \u03c1=0.1 (curves 1); \u03c1=0.18 (curves 2); \u03c1=0.26; \u03c1=0.34 (curves 3); \u03c1=0.42; \u03c1=0.5; \u03c1=0.58 (curves 4); \u03c1=0.66.<br \/>\nThe best result: \u03c1=0.58 (curves 4).<br \/>\n5).<strong>ASSOCIATIVE MEMORY:<\/strong> \u03c1=0.1 (curves 1); \u03c1=0.18 (curves 2); \u03c1=0.26; \u03c1=0.34; (curves 3); \u03c1=0.42; \u03c1=0.5 (curves 4); \u03c1=0.58.<br \/>\nThe best result: \u03c1=0.5 (curves 4).<br \/>\n6).<strong>FUNCTION:<\/strong> \u03c1=0.01 (curves 1); \u03c1=0.02 (curves 2); \u03c1=0.04; \u03c1=0.06 (curves 3); \u03c1=0.1; (curves 4); \u03c1=0.18.<br \/>\nThe best result: \u03c1=0.02 (curves 2).<\/p>\n<p>3.2. <span style=\"text-decoration: underline;\">BackPropagation with Termen Proportion <strong>(BPTP)<\/strong><\/span><br \/>\nBecause, generally, the convergence is slow for the backpropagation rule, different improvements for the weights adjustment algorithm have been suggested. For the backpropagation algorithm, the learning procedure needs a weights modification proportional to\u00a0 <img title=\"\\frac{ \\partial J_{s}^{\\mu} }{\\partial w}\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\frac{&amp;space;\\partial&amp;space;J_{s}^{\\mu}&amp;space;}{\\partial&amp;space;w}\" alt=\"\" \/> .<br \/>\nThe negative gradient method impelled infinitesimal steps, the proportionality constant being the learning rate, \u03c1. For practical reasons, of swifter convergence we select a learning rate as large as possible, without reaching oscillations.<br \/>\nA way for avoiding oscillations at large values for \u03c1 is to alter the weight considering the preceding modification, by adding a proportion term:<\/p>\n<p><img title=\"\\Delta w_{ji}(k+1)=\\rho \\cdot \\Delta _{j}^{\\mu}\\cdot x_{i}^{\\mu }+\\beta \\cdot \\Delta w_{ji}(k))(22)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\Delta&amp;space;w_{ji}(k+1)=\\rho&amp;space;\\cdot&amp;space;\\Delta&amp;space;_{j}^{\\mu}\\cdot&amp;space;x_{i}^{\\mu&amp;space;}+\\beta&amp;space;\\cdot&amp;space;\\Delta&amp;space;w_{ji}(k))(22)\" alt=\"\" \/><br \/>\nWhere k is the iteration number and \u03b2 is a non-negative constant.<br \/>\nIn literature it is asserted that by adding the proportion term, the minimum is more rapid reached because larger learning rates are allowed, without reaching oscillations, it is also recommended that <strong>[7]<\/strong> \u03b2=\u03c1\/k.<\/p>\n<p>3.3 <span style=\"text-decoration: underline;\">BackPropagation with Termen Proportion and Restart (<strong>BPTPR<\/strong>)<\/span><\/p>\n<p>In order to ensure the <strong>BPTP<\/strong> algorithm\u2019s convergence, we took \u03b2=\u03c1\/k and, after a large number of iterations, \u03b2 becomes so small that the proportion term has no further contribution in <strong>restart<\/strong>, meaning the periodic \u201erestarting\u201d of the weights modifying direction to the gradient direction. Thus:<\/p>\n<p><img title=\"\\beta _{k}=\\left\\{\\begin{matrix} \\rho \/i \\: \\: pentru \\: i\\neq 0\\\\0\\: \\: \\: \\; \\: \\: pentru \\: i=0 \\end{matrix}\\right.\\; \\; unde i=k\\: mod\\: I\\; (23)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\beta&amp;space;_{k}=\\left\\{\\begin{matrix}&amp;space;\\rho&amp;space;\/i&amp;space;\\:&amp;space;\\:&amp;space;for&amp;space;\\:&amp;space;i\\neq&amp;space;0\\\\0\\:&amp;space;\\:&amp;space;\\:&amp;space;\\;&amp;space;\\:&amp;space;\\:&amp;space;for&amp;space;\\:&amp;space;i=0&amp;space;\\end{matrix}\\right.\\;&amp;space;\\;&amp;space;where\\&amp;space;i=k\\:&amp;space;mod\\:&amp;space;I\\;&amp;space;(23)\" alt=\"\" \/><\/p>\n<p>For I=1, this yields in the BPTP method. It is recommended that I should take values between 2 and 10.<\/p>\n<p>This time we have two variables: \u03c1 and i. Considering that the proper value of \u03c1 doesn\u2019t chance too much than in the previous method, we search for the most favorable value of I, holding\u03c1=\u03c1<sub>optimum BPTP <\/sub> for each problem, and after that, we hold I settled, checking the optimum \u03c1 around the value \u03c1=\u03c1<sub>optimum BPTP<\/sub>.<\/p>\n<p>3.4. <span style=\"text-decoration: underline;\">Conjugate Gradient BackPropagation (<strong>CGBP<\/strong>)<\/span><\/p>\n<p>Another idea was the applying of profound techniques in optimization theory. Actually, it is obvious that BPTPR is a simplified variant of the <strong>conjugate gradient with restart method<\/strong>. This method differs from BPTPR only by the selection of \u03b2, which here is computed more complicated, according to the gradient norm from the present iteration, k, and from the preceding iteration, k-1, using the following formula:<br \/>\n<img title=\"\\beta _{k}=\\left\\{\\begin{matrix} \\frac{g_{i}}{g_{i-1}}\\cdot \\frac{g_{i}-g_{i-1}}{g_{i-1}}\\; for \\: i\\neq 0\\\\ 0 \\; \\;\\; \\; \\; \\; \\; \\; \\; \\; \\; \\; \\; \\; \\; \\; \\; for \\: i=0 \\end{matrix}\\right. unde\\; i - k\\: mod \\: I\\: \\; (24)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\beta&amp;space;_{k}=\\left\\{\\begin{matrix}&amp;space;\\frac{g_{i}}{g_{i-1}}\\cdot&amp;space;\\frac{g_{i}-g_{i-1}}{g_{i-1}}\\;&amp;space;for&amp;space;\\:&amp;space;i\\neq&amp;space;0\\\\&amp;space;0&amp;space;\\;&amp;space;\\;\\;&amp;space;\\;&amp;space;\\;&amp;space;\\;&amp;space;\\;&amp;space;\\;&amp;space;\\;&amp;space;\\;&amp;space;\\;&amp;space;\\;&amp;space;\\;&amp;space;\\;&amp;space;\\;&amp;space;\\;&amp;space;\\;&amp;space;for&amp;space;\\:&amp;space;i=0&amp;space;\\end{matrix}\\right.&amp;space;where\\;&amp;space;i&amp;space;=&amp;space;k\\:&amp;space;mod&amp;space;\\:&amp;space;I\\:&amp;space;\\;&amp;space;(24)\" alt=\"\" \/><br \/>\nHowever, this method also requests to select the coefficient \u03c1 and the restart index i. Therefore, we rerun the simulations in order maintain constant the values of \u03c1 and i. At the beginning we maintain constant value of \u03c1 (\u03c1<sub>optimum BPTPR<\/sub>) and we search for most favourable value of I around I<sub>optimum BPTPR<\/sub>. The we hold still I at I<sub>optimum CGBP <\/sub>and search for \u03c1<sub>optimum CGBP<\/sub> around \u03c1<sub>optimum BPTPR.<\/sub><br \/>\nThe following simulations have been made (figure 4):<br \/>\n1).<strong>BINAR:<\/strong> \u03c1<sub>BPTPR<\/sub>=0.42 and I=5 (curves 1); \u03c1<sub>BPTPR<\/sub>=0.42 and I<sub>BPTPR<\/sub>= 6; \u03c1<sub>BPTPR<\/sub>=0.42 and I=7;<br \/>\n\u03c1<sub>BPTPR<\/sub>=0.42 and I=8 (curves 2);<br \/>\n\u03c1<sub>BPTPR<\/sub>=0.42 and I=9; \u03c1<sub>BPTPR<\/sub>=0.42 and I=10; \u03c1<sub>BPTPR<\/sub>=0.42 and I=11 (curves 3);<br \/>\nI=11 and \u03c1=0.34; I=11 and \u03c1=0.5 (curves 4).<br \/>\nThe best result: \u03c1=0.42 and I=11 (curves 3)<br \/>\n2).<strong>COUNTER:<\/strong> \u03c1<sub>BPTPR<\/sub>=0.42 and I=3 (curves 1);<br \/>\n\u03c1<sub>BPTPR<\/sub>=0.42 and I<sub>BPTPR<\/sub>=4 (curves 2);<br \/>\n\u03c1<sub>BPTPR<\/sub>=0.42 and I=5 (curves 3);<br \/>\n\u03c1<sub>BPTPR<\/sub>=0.42 and I=6; \u03c1<sub>BPTPR<\/sub>=0.42 and I=7; I=4 and \u03c1=0.5 (curves 4);<br \/>\nI=4 and \u03c1=0.58.<br \/>\nThe best result: \u03c1=0.5 and I=4 (curves 4).<br \/>\n3).<strong>MULTIPLEXOR:<\/strong> \u03c1<sub>BPTPR<\/sub>=0.98 and I=4 (curves 1);<br \/>\n\u03c1<sub>BPTPR<\/sub>=0.98 and I=5 (curves 2);<br \/>\n\u03c1<sub>BPTPR<\/sub>=0.98 and I=6 (curves 3);<br \/>\nI=5 and \u03c1=1.06; I=5 and \u03c1=1.14; I=5 and \u03c1=1.22; I=5 and \u03c1=1.3 (curves 4);<br \/>\nI=5 and \u03c1=1.38.<br \/>\nThe best result: \u03c1=1.3 and I=5 (curves 4).<br \/>\n4).5\u00d75 TABLE: \u03c1<sub>BPTPR<\/sub>=0.74 and I=2 (curves 1);<br \/>\n\u03c1<sub>BPTPR<\/sub>=0.74 and I=3 (curves 2);<br \/>\n\u03c1<sub>BPTPR<\/sub>=0.74 and I=4 (curves 3);<br \/>\n\u03c1<sub>BPTPR<\/sub>=0.74 and I<sub>BPTPR<\/sub>=5; \u03c1<sub>BPTPR<\/sub>=0.74 and I=6; I=3 and \u03c1=0.66; I=3 and \u03c1=0.82 (curves 4).<br \/>\nThe best result: \u03c1=0.74 and I=3 (curves 2).<br \/>\n5).<strong>ASSOCIATIVE MEMORY:<\/strong> \u03c1_BPTPR=0.42 and I=4; \u03c1_BPTPR=0.42 and I=5 (curves 1);<br \/>\n\u03c1_BPTPR=0.42 and I=6; \u03c1_BPTPR=0.42 and I=7 (curves 2);<br \/>\n\u03c1_BPTPR=0.42 and I=8; \u03c1_BPTPR=0.42 and I=9; \u03c1_BPTPR=0.42 and I=10(curves 3);<br \/>\n\u03c1_BPTPR=0.42 and I=11; I=10 and \u03c1=0.34; I=11 and \u03c1=0.5 (curves 4).<br \/>\nThe best result: \u03c1=0.42 and I=10 (curves 3).<br \/>\n6).<strong>FUNCTION:<\/strong> \u03c1<sub>BPTPR<\/sub>=0.02 and I=2 (curves 1);<br \/>\n\u03c1<sub>BPTPR<\/sub>=0.02 and I<sub>BPTPR<\/sub>=3; I=2 and \u03c1=0.06; I=2 and \u03c1=0.1 (curves 2);<br \/>\nI=2 and \u03c1=0.18; \u03c1=0.1 and I=3 (curves 3);<br \/>\n\u03c1=0.1 and I=4; \u03c1=0.1 and I=5; \u03c1=0.1 and I=6 (curves 4); \u03c1=0.1 and I=7.<br \/>\nThe best result: \u03c1=0.1 and I=6 (curves 4).<\/p>\n<p><span style=\"text-decoration: underline;\">Remark: <\/span>: Comparing the results obtained with the different favorable learning methods among these 4 methods we can see in figure 5 (curve 1: BP; curve 2: BPTP; curve 3: BPTPR; curve 4: CGBP) the detached superiority of the CGBP method. The situation that sometimes came upon (for example, in the COUNTER problem) where, though for some initial weights we gain quite satisfying results, sometimes the results are weaker than those obtained with other methods, is due to the following two reasons:<br \/>\na) The high computing complexity implies for the same running time less learning iterations being executed;<br \/>\nb) Maybe the running time has been too little for the decreasing tendency of the CGBP algorithm getting to lower the criterion function at smaller values.<\/p>\n<p><span style=\"text-decoration: underline;\">3.5. BP and CGBP using the error\u2019s absolute value minimization criterion<\/span><\/p>\n<p>In all hereby presented examples, the weights had been computed in terms of the mean square error. The following problem is put forward: which would be the network behavior if instead of the square error one would consider the absolute output error value? We studied this problem in the case of two learning methods: BP and CGBP, which proved to be the best.<br \/>\nThe total error will be computed using the same formula <strong>(3)<\/strong>, but <img title=\"J_{s}^{\\mu }\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;J_{s}^{\\mu&amp;space;}\" alt=\"\" \/>, the absolute output error for pattern \u03bc, will be:<\/p>\n<p><img title=\"J_{s}^{\\mu }=\\frac{1}{2}\\cdot \\sum_{m=1}^{M}\\left | d_{m}^{\\mu }- \\mathfrak{F}\\left ( a_{m}^{\\mu } \\right ) \\right | (25)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?J_{s}^{\\mu&amp;space;}=\\frac{1}{2}\\cdot&amp;space;\\sum_{m=1}^{M}\\left&amp;space;|&amp;space;d_{m}^{\\mu&amp;space;}-&amp;space;\\mathfrak{F}\\left&amp;space;(&amp;space;a_{m}^{\\mu&amp;space;}&amp;space;\\right&amp;space;)&amp;space;\\right&amp;space;|&amp;space;(25)\" alt=\"\" \/><br \/>\nWhere, as in the preceding case, d<sup>\u03bc<\/sup> is the desired output array for the class \u03bc.<br \/>\nFollowing exactly the same reasoning presented widely by the relationships <strong>(5)-(10)<\/strong>, there will yield the relationships to be applied in this case:<\/p>\n<p><img title=\"\\Delta q_{mj}=\\frac{\\rho }{2}\\cdot \\sum_{\\mu }\\sigma \\left [ d_{m}^{\\mu}-\\mathfrak{F(a_{m}^{\\mu}}) \\right ]\\cdot\\mathfrak{F(a_{m}^{\\mu})}\\cdot\\frac{\\partial a_{m}^{\\mu}}{\\partial q_{mj}}=\\frac{\\rho }{2}\\cdot \\sum_{\\mu }\\Delta _{m}^{\\mu }\\cdot v_{j}^{\\mu } (26)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\Delta&amp;space;q_{mj}=\\frac{\\rho&amp;space;}{2}\\cdot&amp;space;\\sum_{\\mu&amp;space;}\\sigma&amp;space;\\left&amp;space;[&amp;space;d_{m}^{\\mu}-\\mathfrak{F(a_{m}^{\\mu}})&amp;space;\\right&amp;space;]\\cdot\\mathfrak{F(a_{m}^{\\mu})}\\cdot\\frac{\\partial&amp;space;a_{m}^{\\mu}}{\\partial&amp;space;q_{mj}}=\\frac{\\rho&amp;space;}{2}\\cdot&amp;space;\\sum_{\\mu&amp;space;}\\Delta&amp;space;_{m}^{\\mu&amp;space;}\\cdot&amp;space;\\nu_{j}^{\\mu&amp;space;}&amp;space;(26)\" alt=\"\" \/><\/p>\n<p><img title=\"\\Delta s_{m}= -\\frac{\\rho }{2}\\sum_{\\mu }\\Delta _{m}^{\\mu } (27)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\Delta&amp;space;S_{m}=&amp;space;-\\frac{\\rho&amp;space;}{2}\\sum_{\\mu&amp;space;}\\Delta&amp;space;_{m}^{\\mu&amp;space;}&amp;space;(27)\" alt=\"\" \/><\/p>\n<p><img title=\"\\Delta _{m}^{\\mu }=\\sigma\\left[d_{m}^{\\mu}-\\mathfrak{F(a_{m}^{\\mu}})\\right]\\cdot\\mathfrak{F(a_{m}^{\\mu})}(28)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\Delta&amp;space;_{m}^{\\mu&amp;space;}=\\sigma\\left[d_{m}^{\\mu}-\\mathfrak{F(a_{m}^{\\mu}})\\right]\\cdot\\mathfrak{F(a_{m}^{\\mu})}(28)\" alt=\"\" \/><\/p>\n<p>Where, by \u03c3[expression] we considered the function that return the sign of the given expression.<br \/>\nLikewise, will be done for the parameters assigned to the synaptic connections between the input and the hidden layer:<\/p>\n<p><img title=\"\\Delta w_{ji}=\\frac{\\rho }{2}\\cdot \\sum_{\\mu }\\overline{\\Delta}_{j}^{\\mu }\\cdot x_{i}^{\\mu }(29)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\Delta&amp;space;w_{ji}=\\frac{\\rho&amp;space;}{2}\\cdot&amp;space;\\sum_{\\mu&amp;space;}\\overline{\\Delta}_{j}^{\\mu&amp;space;}\\cdot&amp;space;x_{i}^{\\mu&amp;space;}(29)\" alt=\"\" \/><\/p>\n<p><img title=\"\\Delta c_{j}=\\frac{\\rho }{2}\\cdot \\sum_{\\mu }\\overline{\\Delta }_{j}^{\\mu } (30)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\Delta&amp;space;c_{j}=\\frac{\\rho&amp;space;}{2}\\cdot&amp;space;\\sum_{\\mu&amp;space;}\\overline{\\Delta&amp;space;}_{j}^{\\mu&amp;space;}&amp;space;(30)\" alt=\"\" \/><\/p>\n<p><img title=\"\\overline{\\Delta _{J}^{\\mu }}=\\left [ \\sum_{m}^{ }\\Delta _{m}^{\\mu }\\cdot q_{mj} \\right ]\\cdot \\mathfrak{F(\\overline{a_{j}^{\\mu }})}(10)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\dpi{120}&amp;space;\\overline{\\Delta&amp;space;_{J}^{\\mu&amp;space;}}=\\left&amp;space;[&amp;space;\\sum_{m}^{&amp;space;}\\Delta&amp;space;_{m}^{\\mu&amp;space;}\\cdot&amp;space;q_{mj}&amp;space;\\right&amp;space;]\\cdot&amp;space;\\mathfrak{F(\\overline{a_{j}^{\\mu&amp;space;}})}(31)\" alt=\"\" \/><\/p>\n<p>Extending to a feedforward R-layers network, the error criterion that should be minimized is the absolute error value determined over the set of all training examples:<\/p>\n<p><img title=\"J_{p}=\\frac{1}{2}\\cdot \\sum_{\\mu }\\sum_{j_{1}=1}^{H_{1}}\\left | d_{j_{1}}^{\\mu }-v_{j_{1}}^{\\mu } \\right | (32)\" src=\"http:\/\/latex.codecogs.com\/gif.latex?J_{p}=\\frac{1}{2}\\cdot&amp;space;\\sum_{\\mu&amp;space;}\\sum_{j_{1}=1}^{H_{1}}\\left&amp;space;|&amp;space;d_{j_{1}}^{\\mu&amp;space;}-\\nu_{j_{1}}^{\\mu&amp;space;}&amp;space;\\right&amp;space;|&amp;space;(32)\" alt=\"\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>The expression (13-16) can be easily altered for the present case; the major modification is linked to the formula (15) which becomes:<\/p>\n<p><img title=\"\\Delta _{j_{1}}^{\\mu }=\\sum_{j_{1}=1}^{H_{1}}\\sigma \\left [ d_{j_{1}}^{\\mu }-\\nu _{j_{1}}^{\\mu } \\right ]\\cdot \\mathfrak{F}\\left ( \\nu _{j_{1}}^{\\mu } \\right )(33))\" src=\"http:\/\/latex.codecogs.com\/gif.latex?\\Delta&amp;space;_{j_{1}}^{\\mu&amp;space;}=\\sum_{j_{1}=1}^{H_{1}}\\sigma&amp;space;\\left&amp;space;[&amp;space;d_{j_{1}}^{\\mu&amp;space;}-\\nu&amp;space;_{j_{1}}^{\\mu&amp;space;}&amp;space;\\right&amp;space;]\\cdot&amp;space;\\mathfrak{F}\\left&amp;space;(&amp;space;\\nu&amp;space;_{j_{1}}^{\\mu&amp;space;}&amp;space;\\right&amp;space;)(33)\" alt=\"\" \/><\/p>\n<p>For the same problems for which the previous learning methods were tested, we have run simulation programs and their results are represented on graphics, together with the corresponding best result obtained in the case of the square error minimization error. Different from the previous simulations, here we used only one set of initial weights and we kept for graphic representation only the best result. This fact does not influence the final conclusions on the efficiency comparison of the two methods: it can be seen, in most cases, the superiority of the method using the square error minimization criterion over the second method.<br \/>\nOn each of the 6 graphics (<strong>figure 6a-6f<\/strong>), the curves 1 and 2 represent the best result obtained with the <strong>BP<\/strong>, accordingly the <strong>CGBP<\/strong> method, using the second error minimization criterion, and the curves 3 and 4 represent the best result after applying the same learning methods (<strong>BP<\/strong>, acc. <strong>CGBP<\/strong>) but for the first error minimization criterion (the square error minimization).<\/p>\n<p><a rel=\"attachment wp-att-488\" href=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/?attachment_id=488\"><img loading=\"lazy\" class=\"alignnone size-medium wp-image-488\" title=\"fig6a\" src=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6a1-300x186.png\" alt=\"\" width=\"300\" height=\"186\" srcset=\"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6a1-300x186.png 300w, https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6a1-1024x636.png 1024w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><a rel=\"attachment wp-att-489\" href=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/?attachment_id=489\"><img loading=\"lazy\" class=\"alignnone size-medium wp-image-489\" title=\"fig6b\" src=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6b1-300x179.png\" alt=\"\" width=\"300\" height=\"179\" srcset=\"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6b1-300x179.png 300w, https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6b1-1024x612.png 1024w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><a rel=\"attachment wp-att-490\" href=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/?attachment_id=490\"><img loading=\"lazy\" class=\"alignnone size-medium wp-image-490\" title=\"fig6c\" src=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6c1-300x191.png\" alt=\"\" width=\"300\" height=\"191\" srcset=\"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6c1-300x191.png 300w, https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6c1-1024x652.png 1024w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><a rel=\"attachment wp-att-491\" href=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/?attachment_id=491\"><img loading=\"lazy\" class=\"alignnone size-medium wp-image-491\" title=\"fig6d\" src=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6d1-300x179.png\" alt=\"\" width=\"300\" height=\"179\" srcset=\"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6d1-300x179.png 300w, https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6d1-1024x613.png 1024w, https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6d1.png 2028w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><a rel=\"attachment wp-att-492\" href=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/?attachment_id=492\"><img loading=\"lazy\" class=\"alignnone size-medium wp-image-492\" title=\"fig6e\" src=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6e1-300x171.png\" alt=\"\" width=\"300\" height=\"171\" srcset=\"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6e1-300x171.png 300w, https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6e1-1024x585.png 1024w, https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6e1.png 2028w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><a rel=\"attachment wp-att-493\" href=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/?attachment_id=493\"><img loading=\"lazy\" class=\"alignnone size-medium wp-image-493\" title=\"fig6f\" src=\"http:\/\/webspace.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6f1-300x180.png\" alt=\"\" width=\"300\" height=\"180\" srcset=\"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6f1-300x180.png 300w, https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6f1-1024x617.png 1024w, https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/wp-content\/uploads\/fig6f1.png 1996w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><br \/>\nFor the <strong>BP<\/strong> method (curves 1), we started initially from the most favorable value for \u03c1, for which we obtained the best network\u2019s behavior as shown in paragraph 3. We varied \u03c1 until the best behavior for the absolute error value criterion has been reached. This is represented by the curves 1 and these must be compared to the curves 3 (<strong>BP<\/strong> for the square mean error).<\/p>\n<p>For the <strong>CGBP <\/strong>method (curves 2), the initial parameters were: optimum \u03c1 found before and the optimum restart index I<sub>optimum CGBP,<\/sub> but using the square error method. The \u03c1 coefficient was kept constant and the index I was varied; then, after choosing an optimum I, this was kept constant and \u03c1 was varied until the most suitable was reached. With these two best parameters, for each application has been represented on graphic, curves 2 which must be compared with curves 4.<\/p>\n<p>The following simulates were run:<br \/>\n1).<strong>BINAR:<\/strong> BP: \u03c1=0.66 (curve 1); CGBP: \u03c1=0.82, I=11 (curve 2);<br \/>\nBP: \u03c1=0.34 (curve 3); CGBP: \u03c1=0.42, I=11 (curve 4).<br \/>\n2).<strong>COUNTER:<\/strong> BP: \u03c1=0.5 (curve 1); CGBP: \u03c1=0.5, I=4 (curve 2);<br \/>\nBP: \u03c1=0.5 (curve 3); CGBP: \u03c1=0.42, I=4 (curve 4).<br \/>\n3).<strong>MULTIPLEXOR:<\/strong> BP: \u03c1=1.22 (curve 1); CGBP: \u03c1=1.22, I=5 (curve 2);<br \/>\nBP: \u03c1=1.22 (curve 3); CGBP: \u03c1=1.3, I=5 (curve 4).<br \/>\n4).<strong>5X5 TABLE:<\/strong> BP: \u03c1=0.58 (curve 1); CGBP: \u03c1=0.58, I=4 (curve 2);<br \/>\nBP: \u03c1=0.58 (curve 3); CGBP: \u03c1=0.74, I=3 (curve 4).<br \/>\n5).<strong>ASSOCIATIVE MEMORY:<\/strong> \u03c1=0.5 (curve 1); CGBP: \u03c1=0.5, I=10 (curve 2);<br \/>\nBP: \u03c1=0.5 (curve 3); CGBP: \u03c1=0.42, I=10 (curve 4).<br \/>\n6).<strong>FUNCTION:<\/strong> BP: \u03c1=0.0005 (curve 1); CGBP: \u03c1=0.5, I=10 (curve 2);<br \/>\nBP: \u03c1=0.02 (curve 3); CGBP: \u03c1=0.1, I=6 (curve 4).<\/p>\n<p><span style=\"text-decoration: underline;\">Conclusions<\/span><br \/>\nThe usual BP methods need testing processes in order to find out the optimum value for \u03c1, value that depends on the given problem to be learned. The more refined methods also request the determining of I.<br \/>\nHence, beside the swifter convergence of the proposed method, this has also the advantage of relieving the user from performing preliminary tests to determine \u03c1 and I. It should be noted that the changes made upon simulating programs when passing from one method to another were as small as possible, searching to modify the running times only by enlarging the computing complexity.<br \/>\nAlthough the neural networks are typical parallel structures, the simulation was accomplished on a computer using a sequential algorithm. In the case of a parallel structure implementation, the comparison should be made not on the same running time, but on the same number of iterations. In such a case the superiority of the proposed NBP method would be increased.<\/p>\n<p><span style=\"text-decoration: underline;\">References<\/span><\/p>\n<ol>\n<li>Battiti, R. ; Masulli, F. \u2013 \u201dBFGS Optimization for faster and automated supervised learning\u201d \u2013 in \u201cProc. of the ICANN\u201d \u2013 Espoo, Finland, June, 1991.<\/li>\n<li>Benaim, M.; Tomasini, L. \u2013 \u201cCompetitive and self-organizing algorithms based on the minimisation of an information criteria\u201d \u2013 in \u201cArtificial Neural Networks&#8221;\/ Kohonen, T.; Mkisara, K. ; Simula, O. ; Kanga, J.(Ed.) \u2013 Elsevier, North-Holland, 1991.<\/li>\n<li>Hornik, K. ; Stinchcombe, M. ; White, H. \u2013 \u201cMultiplayer feedforward networks are universal approximators\u201d \u2013 Neural Networks, nr.2 \u2013 1989; pp. 359 \u2013 366.<\/li>\n<li>Hou, T. \u2013 H.; Lin, L. \u2013 \u201cManufacturing process monitoring using neural networks&#8221; &#8211; Computers &amp; Elect. Engineering \u2013 Vol. 19, No. 2, 1993 \u2013 pp. 129 \u2013 141.<\/li>\n<li>Hush, D.; Abdallah, C.; Horne, B. \u2013 \u201cThe recursive neural network and its applications in control theory\u201d \u2013 Computer &amp; Elect. Engineering \u2013 Vol. 19, No. 4, 1993 \u2013pp. 333-341.<\/li>\n<li>Jacobs, R.A. \u2013 \u201cIncreased rates of convergence through learning rate adaptation\u201d \u2013 Neural Networks \u2013 vol. 1, 1988 \u2013 pp. 295-307.<\/li>\n<li>Krse, B.J.A.; Smagt, P.P. van der \u2013 \u201cAn Introduction to Neural Networks\u201d &#8211; The University of Amsterdam, 1993.<\/li>\n<li>Narendra, K.; Parthasarathy, K. \u2013 \u201cIdentification and control of dynamical systems using neural networks\u201d \u2013 IEEE Trans. On Neural Networks \u2013 Vol.1, No. 1, 1990 \u2013 pp. 4-27.<\/li>\n<li>Ooyen, A. van; Nienhuis, B. \u2013 \u201cImproving the convergence of the back-propagation algorithm\u201d \u2013 Neural Networks \u2013vol.5, 1992 \u2013 pp.465-471.<\/li>\n<li>Philips, S.; Wiles, J. \u2013 \u201cExponential generalization from a polynomial number of examples in a combinatorial domain\u201d \u2013 in \u201cProceedings of the International Joint Conference on Neural Networks: IJCNN\u201993\u201d- Nagoya, Japan, October 1993, pp. 505-508.<\/li>\n<li>Pineda, F. J. \u2013 \u201cGeneralization of backpropagation t recurrent and higher order neural networks\u201d \u2013 in \u201cProceedings of IEEE Conference on Neural Information Processing Systems\u201d\/Anderson, D.Z. (Ed.) \u2013 Denver, CO \u2013 November, 1987.<\/li>\n<li>Pineda, F.J. \u2013 \u201cDynamics and architecture for neural computation\u201d \u2013 Journal of Complexity \u2013 No. 4, 1988 \u2013 pp. 216-245.<\/li>\n<li>Pineda, F.J. \u2013 \u201cRecurrent backpropagation and the dynamical approach to adaptive neural computation\u201d \u2013 Neural Computation \u2013 No.1, 1989 \u2013 pp.161-172.<\/li>\n<li>Pourboghrat, F. \u2013 \u201cAdaptive neural controller design for robots\u201d \u2013 Computers &amp; Elect. Engineering \u2013 Vol.19, No. 4, 1993 \u2013 pp. 277-288.<\/li>\n<li>Ribar, S.; Koruga, D. \u2013 \u201cNeural networks controller simulation\u201d \u2013 in \u201cProc. of the ICANN\u201d \u2013 Espoo, Finland, June, 1991.<\/li>\n<li>Silva, F.M.; Almeida, L.B. \u2013 \u201cSpeeding up backpropagation\u201d \u2013 in \u201cAdvanced neural computers\u201d\/Eckmiller, R (Ed.) \u2013 North-Holland, 1990 \u2013 pp. 151-160.<\/li>\n<li>Volovici, D. \u2013 \u201cThe reliability of flexible technological processes\u201d (Ph. D. Thesis) \u2013 Politehnica University of Bucharest, Faculty of Electronics and Telecommunications, 1994.<\/li>\n<li>Williams, R.J.; Zipser, D. \u2013 \u201cA learning algorithm for continually running fully recurrent neural networks\u201d \u2013 Neural Computation \u2013 No.1; 1989, MIT \u2013 pp.270-280.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>COMPARISON AMONG BACKPROPAGATION LEARNING METHODS Abstract: This paper analyzes and compares several improvements to the backpropagation method for weights adjustments for a feed-forward network. Therefore, the networks\u2019 behavior is simulated on six certain specific applications. It is also presented a &hellip; <a href=\"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/?page_id=145\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/index.php?rest_route=\/wp\/v2\/pages\/145"}],"collection":[{"href":"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=145"}],"version-history":[{"count":29,"href":"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/index.php?rest_route=\/wp\/v2\/pages\/145\/revisions"}],"predecessor-version":[{"id":507,"href":"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/index.php?rest_route=\/wp\/v2\/pages\/145\/revisions\/507"}],"wp:attachment":[{"href":"https:\/\/web.ulbsibiu.ro\/daniel.volovici\/html\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=145"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}