Modeling of Modern Business Intelligence Analytical Functions Based on Stochastic Simulation
(Volume 3, Number 1-2, Spring-Summer 2002.)
Vladimir Šimović, Vilko Žiljak, Tomislav Kosić
University of Zagreb, Croatia
ABSTRACT
A new model of analytical function has been developed for modern Business Intelligence Analysis (BIA). The Croatian model is being used for intelligence investigations of various financial events, markets, subjects or entities, and for financial BIA control methods. The financial BIA functional entropy was predicted utilizing the soft computing process (based on fuzzy logic) and various stochastic simulations. This study explains the simulation-modeling concept of the BIA function. The queuing M/M/s model with priorities was used for solving different stochastic simulations. In practice, the BIA simulation model can be used for measuring analytical capacity.
1. Introduction
Even in the initial stages of establishing a mathematical model, the analytical approach is often too categorical and inflexible to cope with the intricacy and the complexity of real world financial systems such as that in Croatia. In dealing with such complex systems, we face a lot of uncertainty and imprecision. It is therefore necessary to attempt to exploit the human ability to make rational decisions in an uncertain and imprecise environment, and also to use soft computing tolerance for imprecision and uncertainty, in order to achieve an acceptable low cost model. Soft computing is oriented towards fuzzy logic analysis, artificial neural networks, and probability reasoning, including genetic algorithms, chaos theory, and machine expertise. Fuzzy logic addresses imprecision and approximate reasoning.
Optimal decision-making is often inadequate in achieving the best design for a queuing system and developing information on the behavior of a queuing system such as the analytical information queuing system of the Business Intelligence Analysis (BIA). The BIA system (with data warehousing) is an aid to technology, which provides governmental and financial organizations with end-to-end solutions for managing, organizing, and exploiting financial and other data throughout the enterprise. This technology provides tools to collect information (concerned financial business) in a single organized data repository based on a common set of financial and other business definitions. After explication of the soft computing of the entropy of the modern financial analytical function (based on fuzzy logic), the second part of this work provides a short explanation of the results of analysis and simulation of modern financial analytical function based on stochastic modeling.
2. Computing the Entropy of the Modern Financial Analytical Function
2.1 The Soft Computing Based on Fuzzy Decision Logic
First let us examine several basic concepts and notations. If U denotes the universal analytical set, then set U contains all the possible analytical elements in each particular analytical context or analytical application from which a set can be formed. The process by which individuals from the universal set U are determined to be either members or non-members of a set can be defined by a characteristic function. For a given set A where A P U, this characteristic function assigns a value mA(x) to every x l U such that mA(x) = 1 if x l A or mA(x) = 0 if x ¤ A .
This function can be generalized such that the values assigned to the elements of the universal set fall within a specified range and indicate the membership grade of these elements in the set in question. Consequently, larger values denote higher degrees of set membership. Such a function is called a membership function and the set defined by it is a fuzzy set. Here are two different kinds of notations. For example, if U = {x1, x2, x3, x4, x5}, and A = {x3, x4, x5}, then a membership function for the elements of the set A can be dually denoted: A = {x1\0, x2\0, x3\1, x4\1, x5\1}, or mA(x1) = 0 ; mA(x2) = 0 ; mA(x3) = 1 ; mA(x4) = 1 ; mA(x5) = 1 .
Logically, then, a membership function in fuzzy logic can be represented in a similar but slightly different manner. For example, if U = {x1, x2, x3, x4, x5}, and the two fuzzy sets are: A = {x1\0; x2\0; x3\0.5; x4\0.7; x5\0.9}, or B = {x1\0.9; x2\0.7; x3\0.5; x4\0; x5\0}, then a membership function for the elements of the fuzzy set A and B are denoted thus, for a fuzzy set A: mA(x1) = 0 ; mA(x2) = 0 ; mA(x3) = 0.5 ; mA(x4) = 0.7 ; mA(x5) = 0.9, a fuzzy set B: mB(x1) = 0.9; mB(x2) = 0.7; mB(x3) = 0.5; mB(x4) = 0; mB(x5) = 0.
A fuzzy decision is a special type of fuzzy set. The decision in a fuzzy environment (depending on the context) can be viewed as the intersection of fuzzy constraints and fuzzy objective function(s), where the fuzzy objective function is characterized by its membership function, and represents constraints. In contrast to a no fuzzy environment, the decision in a fuzzy environment is defined as the optimal selection of activities that simultaneously satisfy fuzzy objective function and fuzzy constraints. On this basis, the assumption is that the constraints are not interactive, are logical, and correspond to the intersection. By analogy to crisp (not fuzzy) environments and to crisp decision logic, in fuzzy environments we have slightly different decision logic (usually called "fuzzy decision logic"). A linguistic variable x is a variable whose values are words or sentences in natural or artificial language. For example, if intelligence is interpreted as a linguistic variable, then its term set T(X), as the set of its linguistic values, where each of the terms in T(intelligence) is a fuzzy subset of a universe of discourse, say U = [xmin, xmax], or because of practical reasons usually U = [0, xmax] ` R. There are two rules associated with a linguistic variable: syntactic rule (which defines the well-formed sentences in T(X) and semantic rule (by which the meaning of the terms in T(X) may be determined).
Fuzzification of the classical and modern BIA evaluation system
Modern BIA "data contents and information source evaluation system" (the so-called "4x4x2" evaluation system,) is an important concept in the BIA model. It can be viewed as a conceptual tool for reducing the entropy of the modern financial analytical function. The classic or OLD "data contents and information source evaluation system" has one simple criterion, which deals with three linguistic variables for data contents and information source evaluation purposes (low information, unknown=entropy, high information). Modern BIA or NEW "data contents and information source evaluation system" ("4x4x2" evaluation system) has a highly developed criterion, which deals with a minimum of 32 linguistic variables for data contents and information source evaluation purposes.
Table 1. Simplified Modern BIA "Data Contents and Information Source Evaluation System"
For simplicity's sake, we can say that the modern BIA or NEW "data contents and information source evaluation system" deals (in the worst case) with only five linguistic variables for data contents and information source evaluation purposes (say, very low information, low information, unknown = entropy, high information, very high information). Suppose that simplification has been made, and that modern BIA "data contents and information source evaluation system" ("4x4x2") has a criterion system, and that its term set TNEW(intelligence), as the set of its linguistic values, is:
TNEW(intelligence) = vli (verylowinformation) + li (low/information) + u (unknown/entropy) + hi (highinformation) + vhi (veryhighinformation).
Classical BIA "data contents and information source evaluation system" has a simple criterion system, which deals with only three linguistic variables for data contents and information source evaluation purposes, and where its term set TOLD(intelligence), as the set of its linguistic values, is:
TOLD(intelligence) = li (lowinformation) + u (unknown/entropy) + hi (highinformation.
Table 2. Classical "Data Contents and Information Source Evaluation System"
Simplified, analytical information ("financial rumours", etc.) is composed of at least two or more elementary parts. Let us suppose that it is composed of only two elementary parts mI(x) and mII(x), or Analyticalinformation = [mI(x), mII(x)]. For the modern or NEW BIA fuzzy decisions system, it is clear that we have 16 (from 25 possible) compound linguistic variables, with a value greater then 0.5 and which represent real "analytical information". The maximal analytical success ("good fuzzy decision") of the modern or NEW BIA "data contents and information source evaluation system" (in the worst case) is 64% (or 16/25), and it is much gerater than the maximal analytical success of the classic or OLD BIA evaluation system (which is, in the best case, 55.56%, or 5/9). Also, in the BIA source analytical information set X is true (1) or false (0), with probability A and (not A). These relations are: p(X=1) = A , p(X=0) = 1 - A = A. During the analytical transformation process, compound analytical information set Y can have these destination probabilities: B, B , C i C, which are represented in the relation: p(Y=1/X=1) = B, p(Y=0/X=1) = 1 - B = B, p(Y=1/X=0) = C, p(Y=0/X=0) = 1 - C = C. The analytically interesting transformation process is represented (from the given and taken set of analytical information) with the equation for analytical information contents I(X; Y): I(X; Y) =sum (i=1, n) sum (j=1, m) p (xi, yj) ld (p(xi/yj) )/(p[xi] ), where destination set probabilities are: p(Y=1) = AB+AC, p(Y=0) = AB+AC. All compound probabilities are: p(X=1, Y=1) = AB, p(X=0, Y=1) = (1 - A) C, p(X=1, Y=0) = AB, p(X=0, Y=0) = (1 - A)C, p(X=1/Y=1) = AB/( AB+AC), p(X=0/ Y=1) = ((1 - A) C)/(AB+AC), p(X=1/Y=0) = AB/(AB+AC), p(X=0/ Y=0) = ((1 - A)C)/( AB+AC) .
Input for the OLD and NEW BIA system is true (1) or false (0) analytical information's set X, with the same probabilities A = A, where: A = p(X=1) = 0.5 ; A = p(X=0) = 1 - A = 0.5. The maximal analytical success of the NEW BIA system (in the worst case) is 64%, but for the OLD BIA evaluation, the system maximal success is 55.56%. When B = C and C = B, then input probabilities for the OLD BIA systems are:
B = p(Y=1/X=1) = 0.5556; C = 1 - C = p(Y=0/X=0) = 0.5556; C = p(Y=1/X=0) = 0.444; B = 1 - B = p(Y=0/X=1) = 0.4444; and for NEW BIA system are: B = p(Y=1/X=1) = 64% = 0.64; C = 1 - C = p(Y=0/X=0) = 0.64 ; C = p(Y=1/X=0) = 0.36 ; B = 1 - B = p(Y=0/X=1) = 0.36 .
From the equation for I(X; Y): IOLD(X; Y) = 0.008924 bit, INEW(X; Y) = 0.057317 bit, IOLD(X; Y) < INEW(X; Y) for 542.28 bit for only 100 analytic cycles. Then the analytical entropy H(Y) is: I(X;Y) = H(Y) - H(Y/X), H(Y) = I(X;Y) + H(Y/X), where H(Y/X) is "analytical noise".
Consequently, the analytically interesting transformation process is represented (from the given and taken set of analytical information) with the equation for analytical information contents I(X; Y) = HT (or analytical transformation). Without comparing the degree of analytical noise, it is clear that in the NEW BIA evaluation system analytically transformed relevant information H(Y) is only 100 analytic cycles 542.28 bit better then OLD BIA system.
STOCHASTIC SIMULATION OF THE MODERN FINANCIAL ANALYTICAL FUNCTION
Introduction
We are using here the specific M/M/s model, which assumes that all inter-arrival times are independently and identically distributed according to an exponential distribution (our input process is Poisson); that all analytical service times are independently and identically distributed according to another exponential distribution (our analytical service process is Poisson); and that the number of servers is s (any positive integer); however, in the Croatian BIA practice and related analytical function they vary from minimum 1 to maximum 7. With the equal distribution of analytical supply time, with an expected analytical service time of about 1/m (mn is the mean analytical service rate for the overall system, or expected number of clients (data or information) completing analytical service per unit time, and with exponentially distributed inter-arrival time of analytical information at the expected average rate of 1/l (ln is mean arrival rate, or expected number of arrivals per unit time), this represents the most simplified type of Markovian analytical system with an assumed infinite analytical capacity (Y = ą), and with priorities in queue discipline (or without supposed FIFO queue discipline). We are currently researching analytical BIA cases in which there are no possibilities for analytical closeness of a multi-channel model of analytical supply function, or when the utilization factor for the analytical service facility is rs < 1 l < sm (because rs = l/sm).
In the multi-channel model M/M/s we have a priority sub-system with N (where N = 1, 2, ... , k) and relative priority classes where Wk is steady-state or has a total expected waiting time in the analytical system (including service time, or analytical supply time). The steady state expected number of members of priority class k in the queuing system (including those being analytically served) is Lk, and it can be explained in this relation: Lk = lk Wk, for k = 1, 2, ... , N . The expected waiting time in the queue (excluding service time) for priority class k is Wq(k), and can be explained in this relation: Wq(k) = Wk - 1 / m . The corresponding expected queue length ("tail length") is Lq(k), and it can be explained: Lq(k)= lk Wq(k).
Computer-Based Simulation Modeling Process With Stochastic Simulations
We researched behaviour of developed M/M/s model types in relation to the various intensities of analytical traffic (see Table 3).
Table 3. Exploitation levels (rs) of various analytical system (M/M/s) types
rs = l / (sÎm) m=4 m=5 m=6 l = 4 and s = 1 - 0.800 0.667 l = 3 and s = 1 0.750 0.600 0.500 l = 2 and s = 1 0.500 0.400 0.333 l = 4 and s = 2 0.500 0.400 0.333 l = 3 and s = 2 0.375 0.300 0.250 l = 4 and s = 3 0.333 0.267 0.222 l = 2 and s = 2 0.250 0.200 0.167 l = 3 and s = 3 0.250 0.200 0.167 l = 4 and s = 4 0.250 0.200 0.167 l = 4 and s = 5 0.200 0.160 0.133 l = 3 and s = 4 0.188 0.150 0.125 l = 2 and s = 3 0.167 0.133 0.111 l = 4 and s = 6 0.167 0.133 0.111 l = 3 and s = 5 0.150 0.120 0.100 l = 4 and s = 7 0.143 0.114 0.095 l = 2 and s = 4 0.125 0.100 0.083 l = 3 and s = 6 0.125 0.100 0.083 l = 3 and s = 7 0.107 0.086 0.071 l = 2 and s = 5 0.100 0.080 0.067 l = 2 and s = 6 0.083 0.067 0.056 l = 2 and s = 7 0.071 0.057 0.048
We changed exploitation variables of the analytical system (rs = l / s m), in all combinations for the values: l = {2, 3, 4}, m = {4, 5, 6}, and for models: M/M/1, M/M/2, M/M/3, M/M/4, M/M/5, M/M/6 and M/M/7 (the number of analytical servers varies from 1 to 7, or s = {1, 2, 3, 4, 5, 6, 7}). We have completed 62 simulation-modelling experiments with different types of M/M/s multi-channel analytical models. Simulation modelling results were successful(see Table 1). We researched the possibility of rational dimensioning and organization of analytical function, without remaining in a stationary state, of the developed analytical model as well. Now we will be comparing the potential difference between both BIA M/M/s models (NEW and OLD) by stochastic simulations, (first) in a similar experimental situation, and (second) in a minimally different experimental situation. In both situations, we selected the M/M/5 simulation model for NEW and the M/M/3 simulation model for OLD. For both simulation models, we used the same intensity of analytical traffic. Or concretely, lk = 4, and m = 5 for both simulation experiments (see Table 1). Also, in both situations the numbers of non pre-emptive classes are minimally different. Consequently, for the NEW simulation model we used only (N =) 3 different non pre-emptive classes, and for the OLD simulation model we used only (N =) 2 different non pre-emptive classes. The mean exponential distribution of the expected analytical service time (or mean analytical service rate for the overall BIA system, mn) was the same (1/mn = 0.2) for both experimental situations. In the first case, we used almost the same experimental situation for both simulation models (NEW and OLD). We have lk = 4, m = 5, with lkNEW = l1 + l2 + l3 = 1.4 + 1.3 + 1.3 = 4 for NEW simulation model (N = 3), and lkOLD = l1 + l2 = 2 + 2 = 4 for OLD simulation model (N = 2). Results from simulation-modelling experiments were successful. Due to the problem of stochastic convergence, we made nine different series of stochastic simulations after the simulation modelling experiemnts with: 100000, 50000, 10000, 5000, 1000, 500, 100, 50, and 10 arrivals of analytical data (or information).
The variables tell us that for the first non pre-emptive priority class we have a significantly lower time (Wq(1) is lower for 91.94%) for the NEW model than for the OLD model, and that we have a significantly lower number (Lq(1) is lower for 87.10%) for the NEW model than for the OLD model. In the second case, we used a somewhat different experimental situation for both simulation models (NEW and OLD). We have lk = 4, m = 5, with lkNEW = l1 + l2 + l3 = 2 + 1 + 1 = 4 for the NEW simulation model (N = 3), and lkOLD = l1 + l2 = 3 + 1 = 4 for the OLD simulation model (N = 2). Results from simulation-modelling experiments were successful. The variables tell us that for the first non pre-emptive priority class we have a significantly lower time (Wq(1) is lower for 92.39%) for the NEW model than for the OLD model, and that we have a significantly lower number (Lq(1) is lower for 88.52%) for the NEW model than for the OLD model. The NEW model is definitely superior.
Figure 1. Graphic Example of Usage (when r = 0,3 and s = 3)
From Table 3 and Figure 1, we can easily find from the specific exploitation level (for example it is 30%, or r = 0,3), and the specific number of analytical servers (for example s = 3) the maximum intensity of analytical traffic (for this example, it ranges from 4/5 to 4/4, or l/m = [4/5, 4/4]). Or in the opposite direction, we can find the specific exploitation level (it can be from 20% up to 25%) from the specified maximum intensity of analytical traffic (say it is in the interval l/m = [4/5, 4/4]), and the specific number of analytical servers (say it is s = 4)
CONCLUSION
In the NEW BIA evaluation system, the analytically transformed relevant information H(Y) is only 100 analytic cycles 542.28 bit better than in the OLD evaluation system. In the worst case (for the NEW BIA system), the maximal analytical success of the NEW BIA system is 64%, but for the OLD system, it is maximally 55.56%. This conclusion and study provide a solid base for future BIA modeling and simulation process. The benefits of this new analytical model of BIA function consist in the simple method utilized for measuring analytical capacity and capability of analysis
References
Andrews, P., P., JR., Peterson, M., B. (1990). Criminal Intelligence Analysis, Palmer Enterprises, Loomis, California.
Brandt, S. (1999). Data Analysis: statistical and computational methods for scientists and engineers - 3rd ed., Springer-Verlag, New York Inc., New York.
Han, J. (1999). "Characteristic Rules", DBMiner, to appear in W. Kloesgen and J. Zytkow (eds.), Handbook of Data Mining and Knowledge Discovery, Oxford University Press.
Hillier, F., S., Lieberman, G., J., (1995), Introduction to Operations Research - International Editions, McGraw-Hill, Inc., Singapore.
Simovic V., Zrinusic Z., Skugor M., (1999), An Application Of Profound Financial Knowledge Discovery Model, Papers and Proceedings - Euro Working Group on Financial Modelling - 25th Meeting, Vienna, session 12.
Zadeh, L., A., (1996), Fuzzy logic, Neural Networks and Soft Computing in Computational Intelligence: Soft Computing and Neuro-Fuzzy Integration with Applications, ed. By O. Kaynak, L. A. Zadeh, B. Tuksen, I. Rudas, Springer Verlag, NATO ASI Series, under publication.
Zadeh, L., A., et all, (1974), Fuzzy sets and their applications to cognitive and decision processes, Academic Press, Inc., Chestnut Hill, MA.
Ziljak, V., (1982), Simulation with computer (in Croatian language), textbook, Zagreb.