| Home | Research-Publications | Teaching | Links |
Summary:
This part is complementary to the analogous section in the paper, and intends to provide a short description of the well-known metrics considered, in order to ensure clear linguistic semantics. In the following two subsections, we briefly introduce the GM3M and RMI indexes devoted to assessing the preservation of initial linguistic concepts (proximity to the fully interpretable, equidistributed strong fuzzy partition) and the rules consistency (absence of contradiction), respectively.
Formally, the Gm3m (Gacto

These metrics were defined to measure the interpretability when the original definitions of the membership functions need to be modified, which is essential for learning accurate/trustworthy models. It is intended that, with the use of the geometric mean, if one of the metrics takes very small values (low interpretability), the value of Gm3m will also take small values. Gm3m takes values in the range [0,1] with 0 being the lowest level of interpretability and 1 being the highest level.
The complete description, formulation and some examples of how to compute these metrics can be found in (Gacto
Rmi (Rule Meaning Index) (Galende
For a given linguistic FRBS, Rmi is computed as the worst case for the individual values of Rmi(Ri) of each rule Ri in the whole RB. Individually, i.e. for each rule, the goal of Rmi(Ri) is to evaluate the degree of reliability of the Ri rule with respect to the global output that the whole model would infer in the activation zone of this rule (in its core or to certain degree, its α-cut ). Therefore, this index also takes into account the particular inference system used by the FRBS through the inferred output (which is also important since it could also affect the semantic interpretability of RB).
The way to calculate each Rmi(Ri) is as follows:
This section includes the experimental study of the proposed method. The experimentation is undertaken with 23 real-world datasets, with a number of variables within the interval [2, 60] and a number of examples
within the interval [43, 4177]. In all the experiments, a 5-fold cross-validation model (5fcv) has been adopted, i.e., the data-set has been split randomly into 5 folds, each one containing 20% of the patterns of the data-set. Thus, four folds have been used for training and one for testing. The properties of these datasets are presented in Table I: name of the dataset (NAME), short name or acronym of the dataset (ACRO),
number of variables (VAR), and number of examples (CASES). For each data-set, the number of cases and the number of variables is shown. You may download the 5-fold cross-validation partitions for all the datasets in the KEEL format by clicking here.
These datasets have been downloaded from the following web pages:
| NAME | ACRO | VAR | CASES | |
| Abalone | ABA | 8 | 4177 | |
| Anacalt | ANA | 7 | 4052 | |
| Baseball | BAS | 16 | 337 | |
| Boston housing | BOS | 13 | 506 | |
| Diabetes | DIA | 2 | 43 | |
| Machine CPU | CPU | 6 | 209 | |
| Electrical Maintenance | ELE | 4 | 1056 | |
| Body fat | FAT | 14 | 252 | |
| Forest Fires | FOR | 12 | 517 | |
| Friedman | FRI | 5 | 1200 | |
| Mortgage | MOR | 15 | 1049 | |
| Auto Mpg 6 | MPG6 | 5 | 392 | |
| Auto Mpg 8 | MPG8 | 7 | 392 | |
| AutoPrice | PRI | 15 | 159 | |
| Quake | QUA | 3 | 2178 | |
| Stocks domain | STP | 9 | 950 | |
| Strike | STR | 6 | 625 | |
| Treasury | TRE | 15 | 1049 | |
| Triazines | TRI | 60 | 186 | |
| Weather Ankara | WAN | 9 | 1609 | |
| Weather Izmir | WIZ | 9 | 1461 | |
| Wisconsin Breast Cancer | WBC | 32 | 194 | |
| Yacht Hydrodynamics | YH | 6 | 308 |
In this subsection, we include some representative examples of the linguistic models obtained in two of the benchmark problems used for comparison in the previous subsections: WAN (Weather in Ankara) and WBC (Wisconsin Breast Cancer). Figures 2 and 3 depict both models in order to demonstrate not only the accuracy of the method but also the simplicity and easy readability of the rules obtained. Variables in these figures are ordered to represent the same order of splits in the tree generated when learning the rules. Thus, we can consider that each split is a way to recognize the different divisions of the data, from the most general to the most specific.
We have used colors to ease the recognition of the different cases represented by the rules (same color per variable and split). Gray texts are only included to provide additional information, but this information is actually not a part of the rule structure proposed (and therefore it is not needed for inference or for understanding). It is the same for the percentage of covered instances, the Gm3m and the Rmi values, since they have a purely informative purpose and explain the semantic quality of each partition and rule, respectively. As we previously explained, Rmi goes from 1.0 (representing that what a single rule affirms in its main covering region is equal to what the model produces) to 0.0 (representing that what a single rule affirms is completely different to what the model produces). In general, we can see that almost all the rules have qualified with an Rmi equal to 1.0, which indicates (together with the high Gm3m values) that these rules do not interfere significantly with each other, and so each rule locality is preserved. Finally, please take into account that our initial linguistic partitions are strong (which are accepted in the specialized literature as being highly interpretable), and that Gm3m values near 0.8 indicate that their meanings are preserved at a high level (see an example in Figure 1 with the Gm3m values reported in Figure 2). Again, instead of only including the linguistic terms, we are providing the definition points of the membership functions as additional information. This is because the expert in our real case study (the childhood obesity problem) asked us about these numbers after analyzing the rules to check the approximated division values, so we think that they might be of insterest for an expert in any potential problems. Please, skip these numbers if you are not really an expert on the given problem, and remember that the corresponding linguistic terms come from a strong linguistic partition.
Figure 1: Example of a linguistic partition with Gm3m equal to 0.81 (blue), with respect to the corresponding strong fuzzy partition (gray)
Figure 2: KB obtained with the method proposed in the WAN dataset. MSETst obtained is 1.565. For more quality see the pdf here
Figure 2 shows the DB and RB obtained for the WAN dataset (Estimation of average temperature from measured climate factors), whose accuracy (MSETst) obtained is 1.565. The First division (by MinTemp) achieves three different situations depending on the minimum temperature values (the coldest, medium and hottest situations). Taking into account the easiest one (R5, hottest), it determines that when minimum temperatures are very high, the mean temperature should be high (centered on 68.6°F) and move up (or down) depending on the maximum temperature by 0.71 per degree over (or under) 82.2°F. In the cases where the minimum temperature is medium (R3 and R4), we find two different situations depending on the dew point. Where the dew point has a temperature of high or more, the mean temperature should be medium (centered on 57.7°F). And where the dew point is up to medium, the mean temperature should be a little less, i.e. between low and medium (centered on 34.2°F). In both cases, variability is once again explained in the maximum temperature variations, where depending on the dew point, we can see how these maximum temperatures move in different ranges (MaxTemp is centered at 55.5 or 39.9, respectively, to which higher or lower values are be added or subtracted). At this point, we can see also that variability, depending on the maximum temperatures, is higher in the R5 case than it is in theR3 and R4 cases (when temperatures are high, in general, changes to the maximum temperature have a greater effect on the mean value estimation). This kind of relative information among consequent factors cannot be found (or is not easy to find) in the models obtained using the classic linguistic rules, which makes it a new, additional and useful piece of information that has never been seen before in previous linguistic fuzzy proposals. Finally, the cases where minimum temperatures are low (R1 and R2, the coldest cases) could be analyzed in the same manner by taking into account that both rules are dependent on visibility (whether it's a clear day or not) and vary on different factors (on maximum temperatures for clear days or on dew point for foggy days).
Figure 3: KB obtained with the method proposed in the dataset WBC. MSETst obtained is 640.9. For more quality see the pdf here
The linguistic model obtained for the WBC dataset (predicting the months when breast cancer is likely to recur, based on the characteristics of individual cells taken from images) is shown in Figure 3. The obtained model is quite interesting, as with only 3 rules it obtains very precise results with respect to those obtained by the methods in our comparisons. In this case, we leave the interpretation up to the reader, who should take into account that: The texture of the cell nucleus is measured on the variance of the gray scale intensity (i.e., the higher the uglier, thus meaning that they are more malignant); and Fractal Dimension is the approximation to the coastline (i.e., the higher the closer the approximation, so that contours are more regular and therefore more benign). R3 represents the cases with the highest severity, R2 represents the intermediate cases and R1 the cases with the least severity.
While accuracy is not the main focus of the article, the proposed algorithm has also been compared to some highly accurate state-of-the-art algorithms **, in order to help the readers appreciate the achieved accuracy as compared to other methods in the literature (simply for benchmarking purposes). The representative algorithms that we consider in this contribution are shown in Table II. This table shortly describes these algorithms and provides their corresponding literature reference. In relation to the algorithmic parameters, we are considering the standard parameters recommended by authors (those included in each tool as recommended parameters by default). However, the number of total trees in the Random Forest based algorithm is not 500 by default. Setting this value to 500 improved the results systematically, and as significant improvements were not observed beyond this value, it was set to 500 for this comparison. Finally, since our MSE is divided by 2, we multiplied our results by 2 to perform this comparison.
**They are available via recognized software tools such as:
| Algorithm Type | Cite | Reference | Description |
| Model Trees (MT) | M5PrimeLab: M5' regression tree, model tree, and tree ensemble toolbox for Matlab/Octave. | M5PrimeLab code | M5 prime regression method implementation |
| Neural Networks (NNET) | Adam: A Method for Stochastic Optimization. |
Diederik |
MLP squared-loss stochastic gradient (100 hidden neurons) |
| Random Forests (RF) | Gene selection with guided regularized random forest. | Deng |
Regularized random forest algorithm with 500 trees |
| Support Vector Machines (SVM) | Large-Scale Linear Support Vector Regression. | Ho |
Dual coordinate descent for large-scale linear SVM |
These algorithms and the 23 regression datasets are publicly available, so for the sake of simplicity we will directly provide the statistical test results. Table III shows the rankings using Friedman's test of the different methods considered in this study in test error. In this case, the proposed algorithm is ranked second behind RF, which seems to have performed quite well.
| Algorithm | Ranking |
| RF | 1.609 |
| Proposed method | 2.174 |
| MT | 3.174 |
| SVM | 3.522 |
| NNET | 4.522 |
Table IV shows the adjusted p-values (apv) obtained using Holm's test, and compares all the methods versus the proposed method in test error. The results show that the proposed method outperforms those methods that are ranked below it with low apvs (0.128 in the closest case). On the other hand, we can observe an apv that is quite a lot higher in comparison to RF, indicating that the results of these two approaches are not so far apart.
| Algorithm | apv on Tst |
| Proposed vs NNET | 4.289E-6 |
| Proposed vs SVM | 0.023 |
| Proposed vs MT | 0.128 |
| Proposed vs RF | 0.451 |
As previously mentioned, while accuracy is not the main objective of the article, in our opinion, and taking into account that the proposed approach obtains less than 7 rules in all the datasets (less than 5 on average), these results also show a really competitive performance from an accuracy point of view. The proposed approach competes well with models set to more than 500 trees.
| Home Rafael Alcalá Fernández | ||||
| Last update: 18/06/2021 | Optimized for MS-Explorer with 1024 x 768 pixel resolution | |||