Prediction of Caffeine and Protein of Arabica Coffee Beans Using Near Infrared Spectroscopy (NIRS)

Article


INTRODUCTION
Indonesia is known as a country that produces and exports coffee beans to foreign countries.The volume of coffee exports in 2019 reached 359,053 tons of coffee beans, with a foreign exchange value of US$ 883,123 million.In 2020 West Sumatra produced 29,539 tons of coffee with a planting area of 29,646.70 ha (BPS, 2020).Solok Regency is one of the coffee-producing areas in West Sumatra which is known as Minang Solok Radjo coffee.Solok Radjo coffee has received an identification certificate as a type of arabica coffee with a specialty grade.The taste of coffee favored by coffee connoisseurs will not be found in other areas because the aroma and taste are very strong and have character (Ifmalinda et al., 2019).
The rate of increase in the area of production of coffee plants in West Sumatra has not been matched by the uniformity of the quality of the beans produced.The quality of coffee beans in West Sumatra needs to be improved so that they have more added value and competitiveness with other provincial products so that coffee exports can be increased.Coffee beans produced in West Sumatra currently need to be adjusted to Indonesian national standards so they can be easily marketed.The conditions needed in coffee marketing for farmers so as not to lose money are the guarantee of product quality and the availability of products that meet market needs as well as an available and sustainable supply.
Post-harvest handling is the key to success in improving the quality of coffee beans today, especially in West Sumatra (Yuwita et al., 2019).Quality control of coffee beans needs to be monitored, because so far product quality stability in Indonesia is still lacking and needs to be improved.Methods for determining coffee quality quickly and precisely are needed to produce coffee according to the quality desired by the destination country.Currently, coffee quality estimation is generally still done destructively and manually.The obstacle in manual testing is that it takes a lot of time and labor and is expensive.This method is not suitable for use in the industrial world which requires a fast, accurate, stable and non-destructive method for testing the content and quality of coffee.Current developments in food technology in detecting quality quickly and efficiently can be realized using near infrared spectroscopy (NIRS) technology.NIRS can analyze materials quickly and efficiently without damaging the material (non-destructive) (Wang & Lim, 2015).Using NIRS has many advantages over manual testing.NIRS can detect samples in powder, liquid, seed and solid form.In addition, sample preparation is simple with a fast, precise and environmentally friendly sample testing process ( (Ribeiro et al., 2021).
NIRS research on coffee has previously been widely carried out.Zhu et al., (2021) tested coffee bean fat and protein using NIRS with PLS regression.The research treatment method used was Orthogonal Signal Correction (OSC) pretreatment and several pretreatments such as MSC and SNV to improve the quality of the model.From the research results, the PLS method can improve the predictions obtained, especially using the OSC-PLS method with protein and lipid prediction values.The performance results show that the NIR technique using PLS regression can be applied in determining proteins and lipids in coffee beans.
The caffeine and protein content is one of the main components of coffee and its quality must be maintained so that the quality of the coffee is maintained.The protein content in coffee plays a role in providing a bitter taste to the coffee beans produced (Sasmita et al., 2023).Predicting the caffeine and protein content of Solok Radjo coffee using NIRS has never been done.Therefore, the development of NIRS research with the Solok Radjo coffee object needs to be supported and developed to make it easier to detect the chemical content of coffee beans quickly and accurately without damaging it.The aim of this research is to develop a prediction model for the caffeine and protein content of Solok Radjo coffee beans using NIRS.

Tools and Materials
This study used a NIRS device type FT-IR IPTEK T-1516 having a wavelength of 1000 -2500 nm, an interval of 0.4 nm.The material used as a sample was coffee beans picked directly from the farmer's garden, amounting to 1 kg of dry, sun-dried coffee beans.The coffee beans used are Solok Radjo arabica coffee beans which are harvested directly to the garden on Jl.Nagari Aie Winter, Jorong Data, Lembah Gumanti District, Solok Regency, West Sumatra.The location is at an elevation of 1,500 meters above sea level.Geographically, it is located at 00°32'14" to 01°46'45" South Latitude, 100°2 5'00" to 100°41'41" East Longitude.Coffee beans were harvested ripe or red picking.Solok Radjo Arabica coffee beans were semi-wet processed, the beans were dried to a moisture content of 12%, then the coffee beans were roasted at a medium roast level.30 samples were prepared, 67% of which were used as calibration data and 33% were used as data validation.Next, the spectrum was taken from the coffee beans after roasting using a set of NIRS tools, type FT-IR IPTEK T-1516 and the data was processed using unscramble software®.

Methods
This research was carried out in several stages, namely; 1) Preparing sample materials, 2) collecting NIRS spectrum absorbance data of roasted Solok Radjo coffee beans (3) measuring caffeine and protein levels destructively and (4) Building a calibration model for NIRS spectrum data against the actual values of caffeine and protein.

NIRS Spectrum Data Acquisition
The material used in this research was Solok Radjo Arabica coffee beans which had been roasted at medium dark at a temperature of 210°C for 15 minutes.The roasting equipment used is Vina Nha Trang or more commonly called VNT, a machine originating from Vietnam with a minimum capacity of 0.5 kg.The weight of the material from which spectrum data was taken was 6 g for each sample.Spectrum measurements were carried out on roasted coffee beans that had been placed in a 20 cc tube (pot bottle) with a height of 3.5 cm and a diameter of 3 cm, with the beans stacked in the tube.Spectra data collection for each sample was carried out 5 repetitions.
Spectrum measurements use one NIRS unit with a wavelength of 1000-2500 nm.The light is sourced through a tungsten halogen lamp which is shone through the coffee beans using an absorption method, then the light is received by a fiber optic cable and transmitted to the spectrometer.Spectrum acquisition, processing and calibration using unscramble X.1 software.Then the spectrum is transformed using the correction method.The data measured is the absorption of coffee beans.The coffee beans whose spectrum has been measured are stored back in a plastic zip bag for further chemical measurements in the laboratory.

Caffeine Measurement
The procedure for testing the caffeine content of Solok Radjo coffee beans according to SNI No. 01-3542-2004 for ground coffee (BSN, 2004).

a. Making Standard Solutions
Before making a standard solution, first make a stock solution of 1000 mg/L, namely by dissolving 250 mg of pure caffeine in 250 mL of distilled water, then a standard solution is made by taking samples of: 0.05, 0.1, 0.15, 0, 2, 0.25, 0.3 mL, then dilution was carried out again to obtain a solution concentration of 1, 2, 3, 4, 5, 6, 7, 8 mg/L at a wavelength of 275 nm.
b. Caffeine Quantitative Test The coffee beans were ground using a grinder until they became powder and sieved with a 60 mesh sieve, then 1 g of ground coffee was weighed.Put the ground coffee into the Erlenmeyer and add 150 mL of distilled water which has been heated while stirring until it is homogeneous.Then filtered with filter paper added 1.5 g of CaCo3, stirred until homogeneous.The solution is put into the separator flask then add 25 mL of chloroform solution.The solution was shaken and allowed to stand until the solution separated.Extraction was carried out 4 times.The clear solution is put into the Erlenmeer and rotated for 10 minutes or until the chloroform has completely evaporated.Solvent-free caffeine extract was put into a 100 mL measuring flask, diluted with distilled water to the tera level and ultrasonically to make it homogeneous, add distilled water to the tera limit, then the caffeine content was read using a UV-Vis spectrophotometer with a wavelength of 275 nm.(Sudarmadji, 2010) The protein content of Solok Radjo coffee bean was measured using the Kjeldahl method.The material was weighed as much as 1 g and was put in a Kjeldahl flask then added 15 mL of concentrated H 2 SO 4 , 1 g of selenium and some boiling stones and then heated in an acid chamber until clear and coloured light green.Then it was diluted with distilled water up to the mark in the 100 mL volumetric flask.As much as 10 mL of solution was transfered into the Kjeldahl distillation apparatus and was added with 20 mL of 50% NaOH.The distillate was collected with 10 mL of boric acid and added 3 drops of Conway indicator.Distillation was carried out until the reservoir reached 100 mL.The distillate was then titrated with 0.02 N HCL until a pink color was formed.The same was conducted for blank measurements.The formula used to calculate protein levels is as follows (Sudarmadji, 2010):

Measurement of Protein Content
2.1.4.Building a Model Using the PLS Method NIRS spectrum data was processed using PLS (Partial Least Squares) then the spectrum data was compared with actual data obtained from laboratory data.PLS is one of the approaches in linear regression analysis and algorithms, suitable for calibration on a number of samples by combining chemical data tested destructively in the laboratory and NIRS data (Williams & Noris, 1987).Information on the resulting spectral range can be used as a calibration for PLS because it is used as a full spectrum method (Rosita et al., 2016).
In this study, processing of NIRS spectrum data was corrected using several pretreatments to improve the resulting calibration model.The pre-treatments used were standard normal variate (SNV), multiple scatter correction (MSC), and mean normalization (MN) (Cen & He, 2007).Pretreatment SNV is a method commonly used to eliminate variations from spectra that cause scattering due to differences in particle size.SNV eliminates the effect of confounding by adjusting and concentrating the scale on each spectrum, each value in the data sequence to be transformed.The MSC method was used before the calibration and prediction procedures for all spectra with the full MSC option (general amplification and offset removal).Mean normalization is (1) (2) part of the normalization used to organize samples in obtaining data with the same scale (Mardiantono et al., 2022).
The selection of the correction method used in improving the calibration model is based on trial and error, the data is processed with various correction methods and then treatments that are able to improve the calibration and validation models properly will be used in this study.The accuracy and accuracy of the NIRS method in predicting the coffee bean calibration model is assessed from the coefficient of determination (R 2 ) (Munawar, 2014).The higher the value of R2, the better the resulting model will be (Mouazen et al., 2005).The coefficient of determination (R 2 ) was calculated using the following equation: where X n was the actual data (laboratory test), X n was the average actual data (laboratory test), Y n was the predicted chemical data (NIRS estimate), and Y n was the average predicted chemical data (NIRS estimate).
The success parameter of a model was determined from the validation value, which aims to test the accuracy of the resulting calibration value.Standard error of calibration set (SEC), standard error of validation set (SEP), and bias were used as success parameters of the resulting model predictions (Mouazen et al., 2005).The equation for calculating the SEC, SEP, and Bias values is as follows: where n was number of samples.
The ratio value between the standard deviation value (SD) and the root mean square error of prediction value (RMSEP) was called RPD (residual predictive deviation) and also used to determine the success rate of a model.According to Nicolaï et al. (2007) RPD with a value of 1.5-1.9indicates that the model lacks of precision and requires improvement in data calibration.A value of 2-2.5 indicates that the resulting model is good enough, while an RPD value of 2.5 or more than 3 implies model accuracy is very good and acceptable.The equation used in calculating the RPD value was as the following:

Solok Radjo Coffee Bean Sample Data
Chemical analysis data on the content of Solok Radjo coffee beans can be seen in Table 1.Based on Table 1, it is known that the caffeine content of coffee beans is above the 2014 SNI which is set at 0.9 -2%, while the caffeine obtained reaches 2.75%.Likewise, the protein content of the coffee beans obtained is still above the value range for coffee beans after roasting, 13-15% (Clarke & Macrae, 1987).Arabica coffee parameters that do not meet standards are caused by temperature, extraction time and volume of solvent used (Purwasih et al., 2022).Rahayuningsih (2014) stated that the large chemical content of coffee is caused by several factors such as variety, type, growing environment, maturity level, storage conditions, season length, rainfall, pests and the processing process of the coffee beans.Indonesia has many types and varieties of coffee, therefore each region has different chemical contents.Arwangga et al. (2016) stated that coffee caffeine is different because coffee contains caffeine which is still bound to other organic compounds.

Raw NIRS Spectrum and Chemical Content
The raw NIRS spectrum is the initial spectrum of NIRS acquisition data without any treatment or pretreatment.The raw spectrum shows the amount of noise (interference) and also the chemical content in the coffee beans, where the chemical content is shown by the peaks and valleys produced by the wavelengths in the coffee bean spectrum.The spectrum of Solok Radjo coffee beans is at a wavelength of 1000-2500 nm which can be seen in Figure 1.

Figure 1. Raw spectrum of NIRS Solok Radjo Coffee
The raw spectrum of coffee beans shown in Figure 1 has peaks and valleys at certain wavelengths where each peak and valley contains different chemical content information.This is in accordance with the opinion of Blanco & Villarroya (2002), that the chemical components contained in the material to be tested affect the presence of spectral peaks and valleys.The peaks and valleys of the spectrum are also influenced by the physical properties of the material to be analyzed.The wavelength of caffeine (C 8 H 10 N 4 O 2 ) in the NIRS absorbance spectrum of Solok Radjo arabica coffee beans lies at a wavelength of 1256-1475nm and 1937-1974nm. Madi et al., (2018) ) previously conducted research according to which Bondowoso arabica coffee caffeine lies at a wavelength of 1.128, 1.298, 1.672, 1.726 and 1.934 nm.Solok Radjo coffee caffeine has a length that is not much different from Bondowoso Arabica coffee.The protein (CHON) of Solok Radjo coffee is at a wavelength of 1455-1475 nm, and 1935-1974 -1470- nm and 1910- -1965- nm. Abbas et al., (2020) ) state that prediction of chemical content using NIRS is very efficient, economical cost and does not damage the environment because it does not require a lot of chemicals.

Estimation of Chemical Content Using the PLS Method
Developing a prediction model is a major part of the NIRS application to determine the quality of agricultural products, which can be done by regressing spectral data and chemical reference data measured using laboratory methods (Munawar et al., 2022).Calibration and validation of the Solok Radjo coffee model using the Unscramble X.1 software with the Partial Least Square Regression (PLS) method.The prediction model is the relationship between NIRS prediction data and actual Solok Radjo coffee chemistry data.The research produced an equation for the best correction method using MSC pretreatment, for the caffeine calibration equation, namely Y = 0.9962x + 0.0092, while for protein Y = 0.9997x + 0.0039.The results of the predicted caffeine and protein values for Solok Radjo coffee for all correction methods can be seen in Table 2.

Table 2. Prediction of caffeine and protein of Solok Radjo coffee beans
Table 2 presents the results of the prediction of PLS for caffeine in Solok Radjo coffee beans without pretreatment (raw) which is still not acceptable.It can be seen from the results of the SEC and SEP error values, which are high and the RPD value is still small.SEC is a value that indicates the accuracy of the calibration equation while the SEP value indicates the accuracy of the validation model.According to Buchi (2013), SEP and SEC values can be said to be acceptable if they are close to 0. Munawar (2014) said that the smaller the SEC value, the lower the prediction error, as well as a small SEP value indicating a good model.Caffeine prediction models produced without pretreatment are still not acceptable and have not been able to improve PLS performance, so several treatments are needed to improve the resulting prediction models.Mouazen et al. (2005) said that the predictions of a model can be said to be good if the resulting value (R 2 ) is in the range of 0.82 -0.90.In Table 2 the value of the resulting raw caffeine prediction model is still below 0.90 which cannot be said to be good.So to further increase the predictive value of the PLS model, several pretreatments were carried out.The results of the use of several pretreatments can improve the resulting caffeine prediction model.The use of MSC pretreatment greatly affects the results of PLS performance efficiency.
The results of using the MSC pretreatment were able to increase the R 2 value of the caffeine calibration by 89.713% from 0.525 to 0.996%.MSC was also able to reduce the SEC value by 99.009% from 0.202% to 0.002%.The R 2 validation value for caffeine increased by 136.038% from 0.419 to 0.989, the SEP value decreased by 87.790% from 0.172% to 0.021%.The resulting RPD value increased from 1.618 to 11.869.An RPD value of 2.5 or more than 3 is said to have a very good level of accuracy (Nicolaï et al., 2007).The use of MSC pretreatment in PLS was able to improve the results of PLS performance in predicting the caffeine of Solok Radjo coffee beans.Hayati et al. (2021) stated that corrections to spectral data need to be made to produce good and accurate predictions.Predictive results from the caffeine model using the best pretreatment can be seen in Figure 2. Processing Solok Radjo coffee bean protein data using PLS is presented in Table 2, the prediction results without pretreatment for raw spectra are still very low.The resulting calibration values are R 2 = 0.294, SEC = 0.240%, and RMSEC = 0.234%, while for validation the values are R 2 = 0.021, SEP = 0.94%, RMSEP = 1.705%, and RPD = 0.164.The PLS estimation prediction results produced are not accurate as seen from the small determination value, the SEP value which is almost close to 1, and has a fairly high prediction error value and a low RPD value.According to Buchi (2013), the SEP and SEC values are said to be acceptable if they are close to 0. An RPD value of less than 1.5 indicates that the model calibration cannot be used (Mouazen et al., 2005).So to improve the prediction results on coffee bean protein, some pretreatment is needed.The results of the pretreatment obtained are presented in Table 2.
The use of MSC pretreatment can improve the results of PLS performance in predicting Solok Radjo coffee bean protein well.The predicted value using the MSC pretreatment was able to increase the calibration R 2 value by 239.79% from 0.294 to 0.999%.MSC was also able to reduce the SEC value by 98.29%, from 0.234% to 0.004%.the R 2 for protein validation increased significantly from 0.021 increased to 0.999 SEP value decreased by 99.17% from 1.705% decreased to 0.010%.The resulting RPD value increased from 0.164 to 19.943, meaning the resulting model can be said to be very good in predicting Solok Radjo coffee bean protein.Aditama (2019) succeeded in predicting Gayo Arabica protein content using MSC+dg1 pretreatment because protein is one of the ingredients that is quite high in coffee beans, so protein can be predicted easily with good accuracy.Figure 3 shows the prediction results of the caffeine model using the best pretreatment, MSC pretreatment.The most precise prediction results for PLS estimation of caffeine and protein content of Solok Radjo coffee beans using MSC pretreatment, in this pretreatment model can be correlated with the chemical properties of the material because it can affect the analysis of the model (Makky et al., 2019).In accordance with the statement by Ribeiro et al., (2021) that the PLS method can be used to build a prediction model for the chemical content of coffee.

CONCLUSION
This research shows that the chemical content of Solok Radjo coffee beans can be predicted using the NIRS spectrum wavelength.Caffeine (C 8 H 10 N 4 O 2 ) in the NIRS absorbance spectrum of Solok Radjo arabica coffee beans is between 1456 -1475 nm and 1937 -1974 nm wavelengths.The protein wavelength (CHON) is at a wavelength of 1455 -1475 nm, and 1935 -1974 nm.The use of MSC pretreatment was able to improve the PLS performance results in predicting caffeine and protein in Solok Radjo coffee beans very well.The results of the caffeine calibration equation are Y = 0.9962x + 0.0092, while for protein Y = 0.9997x + 0.0039.
The value of the calibration estimate can increase by 89.713%, the SEC value decreases by 99.009%.The R 2 validation value for caffeine increased by 136.038%, the SEP value decreased by 87.790%, and the resulting RPD increased from 1.618 to 11.869.The estimation value for protein prediction was able to increase the calibration R 2 by 239.79%, the SEC value decreased by 98.29%, for validation the R 2 value increased from 0.021 to 0.999, SEP decreased by 99.17% and the resulting RPD increased from 0.164 to 19.943.In the future, it is suggested that the results of this study need to be developed into an application of the resulting model in determining the content of Solok Radjo coffee beans, and for further researchers it is suggested that they should add more research samples.

Figure 2 .
Figure 2. Calibration (left) and validation (right) caffeine content of solok radjo coffee beans using MSC pretreatment

Figure 3 .
Figure 3. Calibration (left) and validation (right) protein content of Solok Radjo coffee beans using MSC pretreatment Hairatun et al. (2017)the protein content obtained is not much different from the results ofHairatun et al. (2017)where Arabica coffee protein