Error Evaluation in Computer Assisted Mass Appraisal Models.
Paper was presented on ERES&AREUEA International Real Estate Conference-98 in Maastricht, The Netherlands (ref. PE/CW-437)
Yu.Kochetkov
. Center for Real Estate Analysis, Moscow CREA@aha.ruINTRODUCTION.
A fast growing real estate market, a need for transition to the market-driven management policies by municipalities, and emerging of experts trained in a new profession of appraiser in the Russian labour market facilitated development of the real property mass appraisal industry at the end of the 1990s, which comprises both research and application institutions. Presently, a number of methodologies on mass appraisal of land, apartments, individual detached houses, as well as commercial and industrial properties has been developed. Typically, municipal agencies are major clients imposing accuracy and reliability of value projections as a critical requirement.
No doubt, a correct approach to solving Computer-Assisted Mass Appraisal (CAMA) problems and to problems of errors as well is needed. In this paper, an analysis of error initiation in CAMA partly caused by the immature real estate market is given. The material for analysis has been obtained during model building for 8 real estate markets of Novgorod, Tver and S-Petersburg of Russian Federation. The direct market analysis was used to analize sales and other evidence of market prices.
ERROR CAUSES IN CAMA.
The CAMA process has several parts: market analysis and data collection, shaping of database, model building and calibration of coefficients. Let us discuss the sources of errors in every stage.
As a matter of fact, some property prices in our database (sample) do not adequately reflect the actual market values. But, normally, we are excluding from consideration such prices. The point is that database builder should check the reliability of information sources. There should be a special instrument or filter procedure for excluding incorrect records before or during calibration. We face two kinds of incorrect records. First, records from official sources often have conservative or faked prices. Secondly, some brokers prefer to give wrong addresses and characteristics with the aim to protect their clients.
So called initial noise, e 0, is more interesting for our speculations. Normally, market prices fluctuate within the certain range depending on broker, buyer, seller, exposure time etc. We can understand this fact as dependence of the price on uncertain factors. Thus, we are working with the property price Y' instead of property market values Y:
Y’ = Y + e 0 (1)
To tell the truth, we can neglect the difference e 0=Y'-Y when we have huge sample based on a well-developed market. As for the Russian real estate market, a worse situation takes place. One property can be sell both 20.000$ and 25.000$ depending on the sale environment. Thus, we can't ignore that noise.
A similar error source is disregarding of insignificant and rare factors. Let us mark such source as e f. As matter of fact, there is no a great difference between e f and e 0. For simplicity we combine it to e 0(e f). On the other hand, in the last resort, we can simply estimate e f or we can input additional factors and their values. But there is no great deal we could do in the case of huge value of noise e 0.

Pic.1 Reflection of the effect of reducing price per square meter for large objects in exponential adjustment.
An exponential adjustment
reflects this phenomenon. In this case, small objects will have little overestimation, but huge objects will be significant underestimating. The author has faced such a problem during model building for Tver industrial properties. The sample had properties with the land areas less than 10 000 sq. m., and the mentioned above adjustment was in the range 0.75 -
1. During the field review, it was found that for huge objects with a land area grater than 30 000 sq. m., the adjustment to values was less than 0.3. Of course, it did not corresponds to the real market situation.
Naturally, we can adopt more complex adjustments for size, but calibration of coefficients in this dependence is difficult, especially for small samples – here the dependence can be caused by natural accident. On a whole, the complexity of the model should correspond to sample size. An excess of the complex dependencies and adjustments leads to emerging of outliers and estimation errors.
Let us consider big enough sample and let us exclude the definitely incorrect records in sample. It's very hard to estimate error when we have small sample; it's simpler to use the concept of Student’s coefficients and the confidence intervals approach. We can exclude the problem of incorrect prices in sample from consideration as well.
Thus, here we have the only source of error – weaknesses of model building. Model building is a complex process consisting of a number of steps: a choosing a model structure, a creation of linearized coefficients (if you do this step), building a response surface (or zoning), an analysis of time trends, and so on. Mistakes can occur on every step. Denote this error by e ì.
ERROR ESTIMATION IN CAMA
It's time to set the task. Let actual market values of properties Y depend on a complete set of the factors Xtot, and there exists an absolutly objective law:
Y = F(Xtot) (2)
In a sample, we use sale prices Y’ (see (1)). Denote our model which we want to build as f(X1) (X1 is a part of the factors Xtot, accessible for collecting). As a matter of fact, the sample is described by the following equation:
Y’ = f(X1) + e db, (3)
where e db is the absolute error of the model describing sample. e db is directly defined by e f, e ì, and indirectly defined by e 0 via e ì. e db can be determined with the help of a set of standard statistical coefficients R2, COV, and COD. The absolute error can be characterised by the standard error of estimation
(4)
where n is the number of records [1] (for another similar definition, see the attachment).
The standard error of estimation is directly related to COV (see attachment). By applying formula (3) to the real properties we get the following equation:
Y = f(X1) + e abs, (5)
where e abs is the absolute error of value estimation.
Then we can pose the problem in this way:
estimate e abs with the help of set of standard statistical coefficients for model building (or e db).
Unfortunately, it seems that a precision solution doesn’t exist for arbitrary case. We propose the following approach: to analyse limiting cases for (1)+(3).
Y + e 0 = f(X1) + e db (6)
First of all, let us state the formal assumtions.
s
(f-Y)2 = s abs2 = s 02 + s db2, (7)where deviations s abs,0,db refer to corresponding errors e abs,0,db.
2) Suppose that e
0 connects with values of the factors X1. In this case the error of the model is partially or fully governed by noise e
0. It means that our model includes wrong dependencies of initial information. Thus, s
abs approximate s
db and is between
and s
db.
On the other hand, we know that mathematical methods allow us to control independent (“white”) noise. It means that though we can not precisely describe prices Y’ in the sample because of the noise, we probably can describe actual property values Y with enough precision. The cases 1) and 2) (see Pic.2) have been simulated for solution of this problem.
Pic.2 Dependence of error (reflected by COV) on initial noise level (reflected by
).
1) Independent initial noise; 2) Initial noise depends on factor value.
A real database of the apartments market of the city Tver was included as a basis for simulation, so we had the real distribution of the factors Pi. The values Y have been simulated with help of a complex random law (for example: (1* P1.987 + 2*P2.911 + 3* P31.024 - P4 + 4*ln(P5)) * P61.8* P7* P8.1* 0.85P9).
In case 1), the prices have been simulated as Y’=Y+(e
-0.5)*(
), in the case 2) as Y’=Y+(e
-0.5)*(
)*z(Pi), where e
is a random value within the range 0 and 1, dispersion parameter D is initial noise level, z(Pi) is the mentioned above dependence of noise on factor values. z(Pi) is randomly distributed around 1 with the dispersion less then dispersion (
). Next a linear model was built by means of multiple regression. The model can be characterised by statistical coefficient, COV has been chosen. Then we can estimate how the model is predicting both the actual property values Y (corresponding curve is f(Xi)/Y) and the market prices Y’ (corresponding curve is f(Xi)/Y’). A result is shown on pic.2. The function curve is close to s
abs=a1+0.0024*(s
db-a2) in both cases, where a1 and a2 are the scaling coefficients depending on a kind of accepted law and average value of z(Pi).
First of all, note that in both cases the COV of model predictions f(Pi) (refer to s
abs) has the limit (4-6%) when
®
0. This is the pure error of model building, because we are trying to describe complex dependencies by means of a linear model. In both cases, the of prediction errors of the actual property values, Y, depend only slightly on the initial noise of sample, regardless of whether the initial noise independent or not. As for the predictions of prices, Y’, you see that for the case 1) linear dependence of COV on noise takes place. In the case 2) this dependence deviates from line, but value of COV less then one in the case 1). The cause of this fact is that the model partially described the variation of prices connected with noise. Thus, with the proviso that actual noise (5-10 percent and more) does exist, the error of the model-generated actual property values Y estimates is lower than the model-generated simulated prices Y’ estimates.
This result doesn’t correspond to formal approach where s abs > s db. In fact, the formal approach gives us an upper estimation of error in both cases, and it seems that the real situation is better. Unfortunately, we have information neither about initial noise level nor about noise dependence on factor level. Thus, s abs = a1+0.0024*(s db - a2) is a lower estimation of error. Actual error s abs is within these ranges.
CAUSES OF OUTLIERS
Let us shortly discuss a problem of outliers. Dr.J.K.Eckert and co-authors in [2] have pointed out a number of outlier causes. In the case of large samples containing accurate data, outliers can result from model weaknesses and from unusual objects. Normally, unusual objects are eliminated from model assessment. The model weaknesses can result from several origins: multicorrelation, instability of complex functions and adjustments, and wrong model structure. In the author’s opinion, the main reason is instability of response surface adjustment or weaknesses of zoning. In any model, instability of response surface is always expected. It’s hard to avoid this problem. Zonings adjustments lead to range problems, global response surface includes complex dependencies.
Unfortunately, many of the databases comprising Russian real estate sales data do not meet the requirements of representativeness, so, it’s hard to locate and eliminate incorrect records. These records directly lead to the emergence of outliers. In this case, we will have good statistical coefficients and fine predictions of sample prices. Presently, this is the first cause of outliers. To solve this problem, we need to logically check all records for small and average databases.
In Russia outliers are a more important problem than percent of error, level of confidence etc. This fact is related to low reliability of data, uncertainty of property laws, moderate level of real estate market understanding, and faulty performance of first CAMA models. On the other hand, the problem gave rise to development of performance evaluation and quality control procedures. Presently, the most effective method is field-review, another well-known method is to apply the model to control sample (set of records which were not included in database or were collected later). Note that field-review mostly reveals outliers, since control sample allows us to estimate the level of error (of course, some records can be outliers as well, but it’s hard to determine sources of these ones).
Conclusion.
To sum up, we would highlight the most important results again:
Attachment.
Statistical coefficients by [2]:

where Si is the predicted price of property i;
![]()
where
is an average price for sample.
References:
[1] V.N.Kalinina, V.F.Pankin “Mathematical statistic”, Moscow.
[2] “Property Appraisal and Assessment Administration” (Gen.ed. J.K.Eckert), USA, International Association of Assessing Officers.