^{1}Magíster en Educación Matemática, angelo.soto@unipamplona.edu.co ,ORCID 0000-0001-5093-0183
Universidad de Pamplona, Pamplona, Colombia.

^{2}Magister en Ingeniería Biomédica luis.mendoza@unipamplona.edu.co
,ORCID 0000-0002-2012-9448 Universidad de Pamplona, Pamplona, Colombia.

^{3*} Doctor en Ciencias byronmedina@ufps.edu.co ,ORCID 0000-0003-0754-8629 Universidad Francisco de Paula Santander, Cúcuta, Colombia.

**How to cite:**

A. Soto-Vergel, L. Mendoza y B. Medina-delgado, “Analysis of energy and major components in chromatographic signals for the diagnosis of prostate cancer”. Respuestas, vol. 24, no. 1, pp. 76-85, 2019.

Received on May 25, 2018; Approved on October 15, 2018

The prostate exam is an early detection tool to prevent prostate cancer and the main diagnostic tools for obtaining signs are generally invasive. This article tries chromatographic signals from the urine of prostate cancer patients and control patients as a non-invasive examination proposal. For this purpose, methodologically, urine samples are taken, digitized in chromatograms, treated with mathematical techniques and classified. The mathematical techniques are time normalization, dead time elimination, baseline correction, noise elimination, and peak alignment. Classification techniques analyze energy, in the domain of time and frequency, and the main components in sedimentation graphs and scores. As a result, the chromatographic signal is characterized and identifies the characteristic curve that represents the signal of prostate cancer patients and control patients. The data structure shows a cluster distribution of 88.88% of the vectors for the control patients. In the case of prostate cancer patients, the distribution of data is in clusters around the area defined by control patients. This characterization demarcates signal classification regions to diagnose possible prostate cancer patients, validating the relationship between the chromatographic signal and cancer.

**Keywords:**Energy analysis, Principal component, analysis, Prostate cancer, Chromatography, Signal processing.

El examen de próstata es una herramienta de detección temprana para prevenir el cáncer de próstata y los principales instrumentos diagnósticos para obtener indicios son generalmente invasivos. Este artículo analiza señales cromatográficas provenientes de la orina de pacientes con cáncer de próstata y pacientes control como propuesta de examen no invasivo. Para tal efecto, metodológicamente, se toman muestras de orina, se digitalizan en cromatogramas, se tratan con técnicas matemáticas y se clasifican. Las técnicas matemáticas son normalización de tiempo, eliminación del tiempo muerto, corrección de línea base, eliminación de ruido y alineación de picos. Las técnicas de clasificación analizan la energía, en el dominio del tiempo y frecuencia, y los componentes principales en gráficas de sedimentación y puntuaciones. Como resultado se caracteriza la señal cromatográfica e identifica la curva característica que representa la señal de los pacientes con cáncer de próstata y pacientes control. La estructura de los datos muestra una distribución de conglomerado, del 88,88 % de los vectores, para los pacientes control. Para el caso de los pacientes con cáncer de próstata la distribución de los datos es en conglomerados alrededor de la zona delimitada por los pacientes control. Esta caracterización demarca regiones de clasificación de señales para diagnosticar posibles pacientes con cáncer de próstata, validando la relación existente entre la señal cromatográfica y el cáncer.

**Keywords:**Análisis de energía, Análisis de componentes principales, Cáncer de próstata, Cromatografía, Procesamiento de señales.

Prostate cancer is one of the cancers that most affects the male gender today; more than 5% of every million people are affected by this disease; In addition, the early detection tools available to prevent it and the main diagnostic instruments to obtain evidence are generally invasive, with the rectal examination and serum concentration of the specific prostate antigen being the best known. In this sense, [1] identified factors that may be related to the non-performance of the exam such as: fear of cancer, shame, discomfort, pain, low educational level, disinformation of the exam, distrust of medical professionals and concern that the rectal touch may affect masculinity; factors that are expected to be mitigated with this research, taking advantage of the increasing use of new technologies, where applications have been developed to improve health conditions worldwide [2], seeking to make the procedures as effective and as invasive as possible.

Computer-assisted diagnostic systems, which use signal processing techniques, have been widely used to diagnose diseases such as upper limb sarcopenia [3], cardiovascular diseases [4], [
5], Parkinson [6] - [8], to mention a few. Likewise, prostate cancer has also been tried to diagnose using image processing techniques from the chemical treatment of a biopsy [9], [10]; others have used machine learning techniques to improve the validity of the diagnosis [11], [12], however, the method remains invasive in obtaining the sample for the analysis of the information contained therein.

However, it is possible to obtain information on prostate cancer non-invasively through chromatography, a procedure defined as the method by which chemical components are separated from a sample, which is represented by a one-dimensional signal with which it is possible to analyze delay, energy or concentration times; allowing the qualitative and quantitative identification of chemical components based on their distribution for characterization [13].

As presented, this article tries urine samples from a chromatographic process to obtain one-dimensional signals, analyzes the differentiating characteristics by applying signal processing techniques and identifies whether the signal corresponds to a patient with prostate cancer or a control patient (no prostate cancer).

Processing techniques for the characterization of chromatographic signals include time normalization, dead time elimination, baseline correction, noise elimination, signal alignment, energy analysis for feature extraction and principal component analysis for classification.

This document presents the materials and methods, describing the methodology implemented and exposes the results obtained with their respective analyses.

Figure 1 shows the research methodology implemented in the sampling stages, database consolidation, signal conditioning using mathematical processing techniques and chromatogram classification.

The consolidation of the database contemplates the digitization of the one-dimensional signals in text files, one per patient, whose data correspond to the intensities of the sample in millivolts. Each patient has an associated chromatogram, which is constructed iteratively until the appropriate resolution is obtained [13], [15].

Figure 2 graphically depicts the text file of a chromatogram with its attributes, where the x-axis constitutes the time in minutes and the axis and the intensity of the sample in millivolts. The recording time in each chromatogram is defined in seven minutes.

The signal conditioning adapts the data, through mathematical processing techniques, so that the characteristics of the signal can be correctly classified and validated; To do this, it uses processes such as: normalization of time, elimination of dead time, correction of baseline, elimination of high and low frequency noise, and alignment of peaks [16] - [18].

The chromatogram classification extracts the energy of each peak, in the time and frequency domain, to form characteristic vectors for each signal, which are used to classify the signals with principal component analysis. This last analysis is commonly used to validate if the chosen characteristics of a signal, in this case, the energy, are correct for classification [19] - [22].

This section is structured based on the methodology of Figure 1, presents the results of the processing of chromatographic signals from urine samples and exposes its analysis.

Downtime Elimination. Determine the time that the chromatogram information contains, identifying the instant at which the first valley of the first peak and the second valley of the last peak of the signal appear. The application of this process to the project chromatograms obtained a minimum signal time of 2.3 minutes and a maximum signal time of 3.0 minutes. For this reason, the maximum signal time of 3.0 minutes is chosen, to avoid the loss of information.

Baseline Correction. Corrects errors in the signal offset concerning the axis of zeros for the correct reading of the intensities of the sample. For this, an algorithm is applied that softens the signal using a weighted moving average filter using (1), identifies the valleys using the criterion of the second derivative using (2) to form a vector with these values, interpolates a curve between two Consecutive valleys for the estimation of the baseline between them, using a second-order spline approximation by (3) and, subtracts the values of the peak baseline to correct the signal by (4).

y

y

y’’is the second derivative of the sample

y

y

y

p(v) is the set of quadratic polynomials of the baseline of each peak

p

v

v

y

p

The first algorithm implemented is rectangular smooth or boxcar which is an algorithm without weighted smoothing replaces each point in the signal with the average of m adjacent points, where m is a positive and odd integer so that the coefficients balance x of the peaks and other characteristics in the smoothed signal. This project defines m= 3 resulting (5).

S

y

y

Another algorithm used is the triangular smooth that implements a weighted smoothing function. For this project it is defined m=5 resulting (6).

The pseud Gaussian o and the w width are also used, which iterate three and four times the rectangular smooth of three points, respectively.

Table I shows the signal to noise ratio for the smoothing algorithms applied, where it is possible to conclude that the best results were obtained by applying four passes of a rectangular smooth threepoint.

Figure 7 graphically depicts the effect on chromatographic signals when applying an iteration with the icoshift algorithm. The figure on the left shows the non-aligned chromatograms and the figure on the right shows the aligned chromatograms; this is done in order to have the representative peaks of each component at the same retention time (location).

The sharpening technique improves the resolution by (7) and the symmetry by (8) and; it contributes to the precision of the measured areas to identify the peaks and valleys and, to delimit the area of each peak with the perpendicular drop method.

R

y

y’’ is the second derivative of the original signal

y’’’’ is the fourth derivative of the original signal

k

S

y

y’ is the first derivative of the original signal

k

As a result, three characteristic matrices are obtained, one for each analysis, formed by the energy of each peak respecting its position. In the case of absence of component, the position in the feature vector is completed with zero.

Figure 9 shows the analysis of main components in a sedimentation plot from the energy data calculated in the time domain, with a unique characteristic curve, whose classification does not show overlap between patients. The main components 1 and 2 are those that contain the most information of the characteristic chromatograms, followed by component 3.

The sequence of the mathematical techniques of signal processing applied to the chromatograms improved the signal-to-noise ratio is 37.67% for control patients, and in 57.55% for patients with prostate cancer. This improvement contributes to the accuracy in the identification of peaks and valleys, the analysis of the energy and main components.

The sedimentation graph has a unique behavior of the main components corresponding to control patients and prostate cancer patients, validating the energy analysis of the peaks of each signal in the time domain as a differentiating factor.

In the score graph, the structure of the data shows a cluster distribution of 88.88% of the vectors for the control patients. The data representing 11.11% is considered atypical and involves an error in the inclusion of the chromatogram in the control group, which could be presented in the urine sample. In the case of prostate cancer patients, the distribution of the data is uniform in three groups of 33.33% of the vectors around the area defined by the control patient vectors. This representation delimits signal classification regions to diagnose possible prostate cancer patients.

The results show evidence to apply the extraction of significant peaks, as a pattern extraction technique and to find other characteristics that differentiate and accentuate the classification of chromatograms of prostate cancer patients and control patients.

[1] Á. Fajardo-Zapata and G. Jaimes-Monroy, “Conocimiento, percepción y disposición sobre el examen de próstata en hombres mayores de 40 años,” Investig. Orig., vol. 64, no. 2, pp. 223–228, 2016.

[2] D. Glujovsky, A. Bardach, S. García-Martí, D. Comandé, and A. Ciapponi, “PRM2 EROS: A New Software For Early Stage Of Systematic REVIEWS,” Value Heal., vol. 14, no. 7, p. A564, Nov. 2011.

[3] I. Rivera et al., “Diseño de dispositivos para el diagnóstico de sarcopenia en miembro superior,” Memorias del Congr. Nac. Ing. Biomédica, vol. 2, no. 1, pp. 174–177, 2017.

[4]L. Garrido-Martínez and R. I. GonzálezFernández, “Revista cubana de informática médica,” Rev. Cuba. Informática Médica, vol. 15, no. 2, pp. 153–164, 2015.

[5] E. Dugarte-Dugarte et al., “Algoritmo de bajo costo de procesamiento para la detección de potenciales tardíos ventriculares (PTV),” CLIC Conoc. Libr. y Licenciamiento, vol. 8, no. 15, pp. 73–93, 2017.

[6] P. A. Stack-Sánchez, G. Dorantes-Méndez, and A. R. M. Rodríguez, “Caracterización del temblor Parkinsoniano mediante dimensión fractal en señales de acelerometría,” Memorias del Congr. Nac. Ing. Biomédica, vol. 5, no. 1, pp. 190–193, Oct. 2018.

[7] M. E. Bedoya-Vargas, J. C. Vásquez-Correa, and J. R. Orozco-Arroyave, “Time-frequency representations from inertial sensors to characterize the gait in Parkinson’s disease,” TecnoLógicas, vol. 21, no. 43, pp. 53–69, Sep. 2018.

[8] ] I. G. Bravo, P. A. S. Sánchez, G. D. Méndez, and A. R. M. Rodriguez, “Evaluación del movimiento a través de acelerometría en pacientes con enfermedad de parkinson,” Memorias del Congr. Nac. Ing. Biomédica, vol. 4, no. 1, pp. 138–141, Sep. 2017.

[9] E. Payá-Bosch, “Desarrollo de un sistema de extracción avanzada de características en imagen histológica para la identificación automática del cáncer de próstata,” Universidad Politécnica de Valencia, 2017.

[10] B. Zapote-Hernández, J. Cruz-Santiago, E. González-Vargas, and A. Jaramillo-Núñez, “Concordancia diagnóstica entre los métodos visual e informático en la detección de metástasis por gammagrafía ósea en cáncer de próstata,” An. Radiol. México, vol. 15, no. 2, pp. 111–119, 2016.

[11] L. Hussain et al., “Prostate cancer detection using machine learning techniques by employing combination of features extracting strategies,” Cancer Biomarkers, vol. 21, no. 2, pp. 393–413, Feb. 2018.

[12] J. Wang, C.-J. Wu, M.-L. Bao, J. Zhang, X.-N. Wang, and Y.-D. Zhang, “Machine learning-based analysis of MR radiomics can help to improve the diagnostic performance of PI-RADS v2 in clinically relevant prostate cancer,” Eur. Radiol., vol. 27, no. 10, pp. 4082–4090, Oct. 2017.

[13] B. Patiño-Domínguez, “Determinación de parámetros operacionales necesarios en el empaquetado de columnas de cromatografía,” Universidad Da Coruña, 2016. [14] R. Majors, Sample preparation fundamentals for chromatography. Canada: Agilent Technologies, 2013.

[15] J. Cazes, Encyclopedia of Chromatography, 3ra ed. New York, 2009.

[16] A. Medina-Santiado, “Sistema de diagnóstico de señales biomédicas con redes neuronales artificiales,” Chiapas, 2015.

[17] J. A. Navarro-Acosta and J. P. Nieto-González, “Detección y diagnóstico de fallas para la dinámica lateral de un automóvil utilizando máquinas de soporte vectorial multiclase,” Res. Comput. Sci., vol. 73, pp. 167–179, 2014.

[18] M. A. Melara-Estrada, “Introducción a la transformada Wavelet y la la teoría de análisis de señales,” Universidad de El Salvador, 2015.

[19] A. Sheinker and M. B. Moldwin, “Magnetic anomaly detection (MAD) of ferromagnetic pipelines using principal component analysis (PCA),” Meas. Sci. Technol., vol. 27, no. 4, p. 045104, Apr. 2016.

[20] Z. Chen, Q. Zhu, Y. C. Soh, and L. Zhang, “Robust Human Activity Recognition Using Smartphone Sensor via CT-PCA and Online SVM,” IEEE Trans. Ind. Informatics, 2017.

[21] J. G. Rueda-Bayona, C. J. Elles-Pérez, E. H. Sánchez-Cotte, Á. L. González-Ariza, and G. D. Rivillas-Ospina, “Identificación de patrones de variabilidad climática a partir de análisis de componentes principales, Fourier y clúster k-medias,” Tecnura, vol. 20, no. 50, pp. 55–68, 2016.

[22] P. Arroyo, I. Suárez, J. Lozano, J. Herrero, and P. Carmona, “Nariz electrónica personal para la detección de contaminantes en el aire,” Actas las XXXIX Jornadas Automática, pp. 894–899, 2018.

[23] T. O’Haver, “A Pragmatic Introduction to Signal Processing with applications in scientific measurement,” University of Maryland at College Park, 2018.

[24] F. Savorani, G. Tomasi, and S. B. Engelsen, “Alignment of 1D NMR Data using the iCoshift Tool: A Tutorial,” in Magnetic Resonance in Food Science: Food for Thought, 2013, pp. 14–24.

[25] A. Kassambara, Practical guide to principal component methods in R : PCA, (M)CA, FAMD, MFA, HCPC, factoextra. STHDA, 2017.

licencia de Creative Commons Reconocimiento-NoComercial 4.0 Internacional