Trajectory optimization using learning from demonstration with meta-heuristic grey wolf algorithm

Nowadays, most robotic systems perform their tasks in an environment that is generally known. Thus, robot’s trajectory can be planned in advance depending on a given task. However, as a part of modern manufacturing systems which are faced with the requirements to produce high product variety, mobile robots should be flexible to adapt to changing and diverse environments and needs. In such scenarios, a modification of the task or a change in the environment, forces the operator to modify robot’s trajectory. Such modification is usually expensive and time-consuming, as experienced engineers must be involved to program robot’s movements. The current paper presents a solution to this problem by simplifying the process of teaching the robot a new trajectory. The proposed method generates a trajectory based on an initial raw demonstration of its shape. The new trajectory is generated in such a way that the errors between the actual and target end positions and orientations of the robot are minimized. To minimize those errors, the grey wolf optimization (GWO) algorithm is applied. The proposed approach is demonstrated for a two-wheeled mobile robot. Simulation and experimental results confirm high accuracy of generated trajectories. This is an open access article under the CC BY-SA license.

INTRODUCTION Nowadays, usage of mobile autonomous platforms in the field of manufacturing applications is becoming more and more popular. Engagement of the mobile platforms in such applications builds an intensive demand for solving several crucial problems, which one amongst them is the proper, optimal path planning algorithm. An interesting example of the production process is when the workpiece is placed inside a strictly defined workspace in a completely random way, so that its position and orientation are variable, and the path has to be continuously adapted. Such cases can occur, for example, in processes of welding [1], [2], painting [3], [4], cleaning [5], transporting [6].
Several techniques can be used for robot optimal path planning, when moving during manufacturing processes. In [7] the authors present an original modification of the grey wolf optimization (GWO) algorithm to find an optimal path for a multi robot case in order to maintain proper distance from other robots and obstacles. GWO is a meta-heuristic algorithm inspired by grey wolves. Benchmarks presented in [8] show that the GWO algorithm acquires better performance in minimizing test functions in relation to other algorithms such as particle swarm optimization (PSO), gravitational search algorithm (GSA), differential evolution (DE), ❒ ISSN: 2722-2586 fast evolutionary programming (FEP) techniques. A solution presented in [9] utilizes the fish swarm algorithm to find a path avoiding obstacles existing in the surrounding environment of the robot. At the final step the authors apply Bessel curve theory to achieve smoothed trajectory. Usage of the evolutionary algorithms for the purpose of path optimization is also frequently spotted in the form of many different modifications. As an example, the mutated cuckoo optimization algorithm can be mentioned [10], used as the global path planner together with a genetic algorithm. Bioinspired algorithms are now widely implemented in both 2D and 3D trajectory planning. In the paper [11], the authors used GWO and golden eagle optimizer (GEO) and their modifications to plan the unmanned aerial vehicle (UAV) path carrying out inspection works. When cooperation of several robots is required to process or manufacture a given element, artificial potential fields may be used to maintain proper structure and distances between the platforms [12], [13], [14].
If the process of path planning must be sped up, particularly in the case of variable pose of the manufactured element, the learning from demonstration (LfD) approach [15], [16] can be used instead of building the entire path from the beginning. At the first step of the LfD, the operator remotely moves the robot around the processed object roughly driving through the work spots and focusing on proper avoidance of the obstacles [17]. Next, LfD utilizes the demonstrated trajectory in the optimization algorithm to achieve satisfactory trajectory indicators. There are different implementations of robot learning algorithms. The most popular of them are neural networks, optimization algorithms and dynamic movement primitives (DMP) [18]. Metaheuristic optimization methods, such as genetic algorithms, are gaining more and more popularity due to their speed as well as efficiency to find optimal solution. On the other hand, neural networks show greater adaptation to changing conditions, but at the same time consume large computing resources. Another method used in LfD is DMP. This method can be applied to model robot's motion based on demonstration [19]. However, it requires the implementation of complex mathematical operations.
Navigational systems are used to increase precision and create a closed feedback loop for the robot position controller. In order to verify performance of different navigational algorithms and to evaluate accuracy of the generated trajectory, it is required to measure position of the mobile platform. In indoor applications global navigational satellite systems (GNSS) may not be effectively used. Therefore, other navigational systems are discussed shortly.
Lateration based positioning systems, apart of anchors positions, utilize the distance between the tag and the anchors. Those systems can be further classified by the way the distance is measured or by the utilized measurement technology [20]. Most accurate way to determine the distance is the time of flight (TOF) or time-difference of arrival (TDOA). Beside those two, there are also phase of arrival (POA) methods and methods that use a received signal strength indicator (RSSI). Lateration based positioning systems utilizing the following measurement technologies were built and successfully evaluated: magnetic field, ultrasonic, infrared, radio frequency (Wi-Fi, Bluetooth), and vision, which can also be used for obstacle avoidance applications [21], [22]. Popular ultrasonic based systems are characterized by high accuracy (up to 1 cm), yet still suffer of low range and significant interference from wind, and other sound sources [23]. A promising radio frequency (RF) technology for indoor positioning purposes is the ultra-wide band (UWB) technique [24]. Accuracy of the UWB based positioning systems can be established in the range from 10 to 30 cm, yet UWB signals can propagate through thin obstacles, and are resistant to most of electromagnetic interferences.
In the current paper, the problem of mobile robot path planning, in which processed elements are sequentially randomly placed in the predefined area, is considered. The task is realized by a mobile platform, having the form of a two-wheeled robot, equipped with a proper work tool, and operating in the planar surface. The result of the proposed algorithm is not the path in the form of successive robot positions, but the values controlling the operation of robot's drives. The objective of the path optimization algorithm is to achieve the predefined work spots with the best possible accuracy in robot position and orientation.
Efficiency of the proposed path planning algorithm is demonstrated with a practical application in which the mobile two-wheeled platform travels to certain base points in order to perform the tasks that are specific to the manufacturing process. In the considered case, the main problem are various positions and orientations of processed elements. By using the LfD technique, together with the GWO algorithm the controller can generate control signals for the drives, which move the platform to required base points along the optimized trajectory learned. The proposed path planning algorithm can be used in such manufacturing processes as: welding, painting, or cleaning of the elements.
A novel contribution of the proposed approach is that the shape of the required mobile robot trajectory is generated and stored in robot's controller memory only once, during the LfD phase. Then, that learned trajec- tory can be multiply used by robot's controller, by recreating its shape for different positions and orientations of the processed element. There is no need to again teach the controller this trajectory for those new positions and orientations.

EXAMPLE MANUFACTURING PROCESS
In the considered manufacturing process a processed element is placed inside a specified area Figure 1. There might be two or more work spots on the element that have to be processed by a mobile platform equipped with a specialized equipment or a manipulator. For the purpose of the exemplification, manufacturing process of the painting system in considered, in which only two work spots are included. Accuracy of element placement is limited, as its position and orientation may vary due to inaccuracy of transporting machines. Therefore, it is impossible to define one universal trajectory of the mobile platform. The processed element (4) is transported to the assembly hall in one part Figure 1. The object is placed at the working area by the crane (2). Due to large dimensions and weight of the object, the crane does not operate precisely, and thus the actual position and orientation of the object relative to the local coordinate system xy can vary. Therefore, the position x, y and orientation φ of the object is detected by the positioning system (3) utilizing two markers placed at points M1, M2. The locations of base points B1 and B2 are determined by shifting the position of markers M1, M2 by a constant offset value. The entire trajectory of the mobile platform (1) consists of three parts: travel trajectory T1, intermediate trajectory T2, and return trajectory T3. The task of the mobile platform is to adjust its motion trajectory to the changing position and orientation of the processed element. Aforementioned feature can be seen in Figure 2, where trajectories T1p and T3p differ from those of T1, T3 in Figure 1.
The position and orientation of the intermediate trajectory T2 depends on positions of base points B1 and B2, which are strictly related to the pose of the processed element. However, the shape of the trajectory remains the same regardless of the position and orientation of the painted element. It is required that trajectory T2 ends exactly at the base point B2. In order to fulfill such requirement, the two-stage LfD algorithm (section 3.3.) is utilized to generate the trajectory avoiding painted elements and ending exactly at point B2. In general, trajectory T2 may cover many base points, depending on the number of work spots on the processed element.

Travel/return trajectory
Travel trajectory T1 is generated between base points B0, B1 assuming that the travel path is free of obstacles Figure 1. Trajectory T1 is obtained as a result of the operation of position and orientation controller presented in section 3.1. The controller uses the UWB system and the digital compass to measure the position and orientation of the moving platform and generates control values for drive units. At the end of travel trajectory T1, the robot reaches base point B1, defined with (x B1 , y B1 , φ B1 ). Due to friction and wheel slips this point is reached with some position and orientation errors. Then, the robot is ready to recreate the learned, intermediate trajectory T2. After completing the task at base points B1, B2 (i.e. after traveling the intermediate trajectory T2), the platform is returned to the starting point B0. This time the platform travels along trajectory T3, which is obtained similarly as trajectory T1.

Intermediate trajectory
Intermediate trajectory T2 is the most important part of the entire robot path. Its shape is determined based on the poses of base points located around the processed element. In general the intermediate trajectory may contain any number of base points. The trajectory is obtained as a result of the proposed LfD algorithm presented in section 3.3.

METHOD
The following section summarizes the methods used to control the robot. The main focus is put on the control algorithm for travel/return trajectory T1, T3, and the learning from demonstration algorithm for the intermediate trajectory T2. The proposed methods can be easily implemented in the mobile platform controller.

Control algorithm
The simplest controller, which may be used to control the robot over a desired trajectory, is the proportional controller Figure 3. Such controller is used to drive the mobile platform over travel/return trajectories T1, T3. The control loop consists of two separate parts. The first one is the orientation controller, which can be described by the following (1): where ω(t) is the desired angular velocity of the robot, K φ is the controller gain, is the actual orientation error at time t, φ ref is the desired orientation, φ(t) is the actual orientation at time t. The second part of the control loop is the position controller , y(t) are actual positions of the robot at time t.
In real applications, both linear and angular velocities are limited to an acceptable level by the limiter. This level is determined based on the limitations resulting from mechanical/electrical parameters of the robot drive unit. Those limitations are included in the robot control system, as presented in Figure 3. Different algorithms can be used to generate travel/return trajectories T1, T3. One method is to control the robot to a destination point by using an intermediate direction [25]. The intermediate direction is obtained basing on a right-angled triangle that is presented in where 3.2. Model of the differential drive An algorithm for trajectory optimization in section 3.3.2 evaluates performance of trajectories based on the estimated final position of the robot. Hence, it is required to simulate robot's trajectory based on the angular velocity of its wheels. In the following section relation between control outputs ω R (t), ω L (t) and the position of the robot is presented. The main geometry parameters that are important for description of vehicle kinematics in the form of differential drive's model are axle length L and radius of the wheels r. The pose of the robot at the 2D plane is described by position coordinates x, y and orientation angle φ. Considering robot's motion in the local coordinate system, robot's linear speed v is referenced to point K in the center of the driving axle Figure 5. Assuming the relation between linear and angular speeds of the wheels is v R = rω R , v L = rω L , linear speed v(t) of the robot may be expressed as: rotational speed of the platform ω(t) can be presented as: when the robot turns round, it moves around a circle, whose center point is referenced as the ICR (instant center of rotation). Radius R(t) of that circle can be calculated as: forward kinematics of the robot with a differential drive in the local coordinate system is expressed as: using trapezoidal approximation, (9) can be presented as: where T is the time step. In (10)-(12) are used to simulate robot's trajectory during the optimization process of control variables ω Ropt , ω Lopt . The presented simplified model of robot's forward and inverse kinematics assumes ideal ambient conditions. Its formulation does not include model unaccuracies due to friction, wheels slip, and loading forces.

Learning from demonstration
Learning from demonstration technique can be divided into two stages. During the first stage, the trajectory between the base points is demonstrated by the operator. Data obtained from the demonstration run is processed with the use of an optimization algorithm in the second stage of the LfD. The second stage is used to increase the precision of the final position and orientation of the robot in the base points. The optimization algorithm used during the second stage is the grey wolf optimization algorithm.

Demonstration stage
At the beginning of T2 trajectory, it is assumed that the platform is at the base point B1 (x B1 , y B1 , φ B1 ), which has been reached using the previously mentioned travel trajectory T1. The platform learns how to generate control signals for the intermediate trajectory T2 during the demonstration process. During the process, the operator takes control of the platform, switching it to the manual mode. Then, using the wireless control unit, the operator drives the robot around the manufactured object from the base point B1 to the base point B2. The operator takes into account obstacles in robot's path as well as the shape and size of the processed element. While controlling the robot, the operator tries to move the platform accurately to the close vicinity of point B2. During this process, the values of control signals delivered to either left ω Ldem or right ω Rdem wheel motors are stored into the system memory. The operator makes several demonstration trips Figure 6 in order to collect data that will be further used during the optimization process.

Optimization stage -GWO algorithm
In the next step the demonstrated trajectory is optimized through the GWO algorithm. Grey wolf optimizer (GWO) is meta-heuristic algorithm inspired by grey wolves. The GWO algorithm imitates the hunting mechanism of grey wolves in nature. Four types of grey wolves such as alpha, beta, delta, and omega are employed for simulating the leadership hierarchy. The algorithm includes four most important steps: hunting, searching for prey, encircling prey, and attacking prey [8].
In this step the best trajectory from the demonstration set is selected as the template for further optimization. The criterion for the best demonstration trajectory is the final accuracy of achieving point B2 by the mobile platform. For the selected trajectory template, the vector of control outputs is considered during the optimization. In each optimization step, the new value of the wheel control signals for left and right wheels is accordingly calculated as where N is the sampling step of the demonstration run, ω Lopt is the optimized left wheel control value, ω Ldem is the left wheel control value from the demonstration, ∆ω L is the k-th modifier of the control value returned by the GWO algorithm.

❒ ISSN: 2722-2586
In the next step the control values of the left and right wheels are converted to the linear and angular speed of the robot using (6)- (7). Then, using odometry (10)- (12), the linear and angular velocity is converted to the position of the robot. Values of robot's end position and orientation angle are used to calculate the goal function: where x T 2 , y T 2 are the final planar position coordinates of the robot, φ T 2 is the final orientation of the robot, x des , y des , φ des are the desired values of the final position coordinates and orientation of the robot. Next, the optimized control values ω Ropt , ω Lopt are experimentally verified. For that purpose, the robot uses the optimized control values to perform the T2 trajectory starting from the base point B1. When the robot finishes its drive, the final position and orientation are used to evaluate the trajectory generated during the optimization stage. Apart from the evaluation of the resulting position and orientation accuracy, operator also supervises robot's maneuvers to avoid any physical collision of the robot with the processed object. Supposing the operator is satisfied with the achieved results, the optimized trajectory is approved for further usage. The complete teaching/control routine is shown in Figure 7.

. Simulation results
Evaluation of the presented trajectory optimization algorithm started with simulations. The differential drive model presented in section 3.2 was utilized throughout simulations to calculate subsequent positions of the robot according to control inputs ω R , ω L . Simulation process can be divided into two separate stages.
During the first stage of the simulation, the assumed demonstration trajectories are utilized in the GWO algorithm in order to achieve the optimized intermediate trajectory T2. The presented research examines two different cases of workplace conditions, in which the autonomous robot is utilized to perform painting tasks. The first case deals with a single element that has to be avoided. The second case deals with two painted elements. For each of this conditions a demonstration trajectory that avoids the specified painted element is established. Thus, cosine-based (15) and sine-based (16) functions are assumed as if they are to avoid one or two obstacles accordingly.
where y 1 , y 2 , x are coordinates in the 2D plane. The cosine demonstration path, (15), is shown in Figure 8, while the sine path (16), is shown in Figure 9. is performed directly on control signals ω R , ω L , and focuses on minimizing the goal function specified in (14). On the basis of a single demonstration trajectory, optimization is performed several times in order to achieve a set of different solution candidates. Several solutions acquired after the optimization stage for the demonstration trajectories y 1 and y 2 are shown in Figures 8 and 9. Optimized trajectories are slightly different, but all of them converge to a desired point B2. In Table 1    In the second stage of the simulation the entire painting problem including travel T1, intermediate T2, and return T3 trajectories is considered. Since intermediate trajectory is optimized during the first stage of simulation, it is necessary to determine shape of trajectories T1 and T3. During the travel trajectory T1, the robot begins to move from the starting position B0, defined with the coordinates vector q = 0 0 0 and stops at point B1. Using (1)-(5), the control algorithm described in section 3.1. calculates control signals to drive the robot from the starting position B0 to the first base point B1.
Then, the actual travel around the painted object is started along the optimized trajectory T2 determined in the first part of the simulation. When the robot finishes it is movement along the optimized trajectory T2, it is located at the base point B2. The robot stays at points B1 and B2 for the time required to finish painting tasks.
Finally, the platform returns to the starting position B0 using the return trajectory T3 generated with the same equations as for the travel trajectory T1. Simulation results of trajectory generation for the painting problem described through the demonstration trajectory y 1 are shown in Figure 10. Two different alignments of the painted object are analyzed, where Figure 10

Experimental results
The experiment focused only on the optimization of trajectory T2 between base points B1 and B2 along the processed object. In the experimental stage both demonstration functions described by (15) and (16) were evaluated. The robot moved inside the tested area of approximately 4×5 m. The experiment was conducted on a board that was specially designed to include supportive localization lines Figure 13. The experiment was conducted with the platform utilizing the differential, two-wheel drive. Platform's motors were driven by a PID unit coupled with optical encoders located on the motor shaft to control rotational speed of the wheels. The UWB positioning system was used to record robot's position. The UWB system utilized during the test consisted of seven anchors positioned at the edges of the tested area, and one tag placed on the robot. An average accuracy error, the UWB navigational system achieved in the aforementioned setup was 0.017 m in x axis, and 0.034 m in y axis. The components of the UWB system are described with labels A0-A6 and are marked with red circles in Figure 13. Orientation of the platform was measured with the SPARTON AHRS-8 digital compass.
The experiment was started with the demonstration run. Similarly to the simulation, sine and cosine trajectories were recreated. For each of them, subsequent values of rotational speeds of the right ω R and left ω L wheel were registered Figures 14 and 15). Then the obtained data was optimized using the GWO algorithm in order to achieve desired target pose B2. The control values optimized in that way were loaded into the robot and the drive was performed for each of the desired poses B2. The trajectories recorded for different target positions B2 are shown in Figure 16

CONCLUSION
The article presents a new approach in the generation of the trajectory for an autonomous platform in modern manufacturing tasks. The main advantage of the discussed solution is the possibility of optimizing the trajectory demonstrated by the operator in order to improve accuracy of achieving a desired pose. Involvement of the operator for demonstration of the intermediate trajectory ensures that all of the obstacles are avoided in the preliminary stage. Since the presented solution includes a control algorithm to navigate through the travel and return trajectories, it is independent from the position and orientation of the elements that have to be avoided. It is achieved by combining the positioning system with the position controller in a closed loop. Hence the presented solution does not require qualified employees to operate and can be used by a person without experience and knowledge in the field of robotics. The teaching runs do not have to be precise; it is enough for the robot to avoid obstacles and reach the base position roughly. The algorithm is capable of generating reproducible control values. Simulations and experimental research showed that the obtained results are satisfactory and confirm the usefulness of the GWO based trajectory optimization solution in the practical applications. The time required to achieve solution through the GWO algorithm mainly depends on the length of the intermediate trajectory and settings adopted during the optimization. The GWO algorithm used in the presented research needed usually around 30 s to calculate the optimized trajectory on the Intel i3 based laptop. The system is not yet fully adapted to practical use, the obstacles that may appear on robot's optimized trajectory are not taken into account. In simulation studies, the obtained accuracy of the final position was 10 times higher than in the experiments. Such accuracy deterioration was caused by the use of a simple differential model in a real experiment and limitations of the measuring devices. In further work, implementation of a more complex robot model that is closer to reality may give more precise results. Moreover it is significant to improve the reliability of the generated trajectories, by taking into account positions and shapes of the obstacles, which have to be avoided during the GWO optimization.