Computer viruses pose a vital threat not only to devices that are interconnected across the world, but also to individuals who use them. Conficker worm in France has infected the computers of more than 15 million people, including the French military systems. The spread of information has either negative or positive effects, leading to the need of implementing certain policies in order to analyze it, as well as to provide methods of mitigation and prevention. Coming up with the policies that would mitigate the spread of information is not easy. Additionally, verifying the effectiveness of the policies is a complicated process as well.
The spread of information can be described by using the evaluation of graph nodes of states. In the graph, the nodes are the representation of peoples computers or parts of the forest. The edges of the nodes indicate possible transmission vectors. Two people or computers that interact have an edge between their distinct nodes. Based on the graph, practical solutions aimed to prevent the viruses spread are very important. This paper will contemplate a practical method of locating and analyzing policies that are effective in mitigating spread of infection. The policy to be applied in practice is subdivision of network, aimed to enable prevention and intervention means of mitigating infection spread.
The model that the paper applied is the Graph-Theoretic Model of Spread that was previously presented by Santhanam. The directed graph is G (V; E), where V represents entities such as people, computers, or parts of the forest which have the potential of being infected. E represents the edges of the nodes where infection can spread from one node to another.
For instance, if the nodes (vi; vj) E are present, then Vu can transmit infection from itself to Vj. In the undirected graph, the edge can be named as ((vi; vj) and (vj ; vi)), which are the nodes between the edges vi and vj that are interconnected. Directed or undirected methods can be applied to the graph in this paper, but for simplicity, the method that was used in the paper was undirected. In the undirected method, each node (vi V) is associated with a state ?(vi) 2 , which is a domain that indicates whether the node is infected or uninfected. The set of the neighbor nodes are vi V is ?(vi) = fvj V : (vi; vj) , indicating that vi nodes have neighbors with those nodes V that have their edges as vi.
The configurations of the nodes are in the initial graph and have a distinct time, defined as step t. The current state of the nodes in the initial graph is given as a function of (a). The state of the node has been given by the function (b), both currently and previously. Therefore, each time the node Vi undergoes a transition, the transition is followed by the function (a) together with (b). f is the function that describes the transition of the individuals nodes, with the consideration of other nodes in the graph and function (a). g, on the other hand, is a function that describes the transition that nodes V undergo, taking into consideration its own history, and it will be a representation of function (b). Consequently, f together with g follow the states of the nodes after being infected by themselves or by other nodes in the areas.
The graph G (V; E) aimed to show the infection spread that the nodes went through. ? is a representation of possible states of the nodes, whether open, infected, or protected. If the nodes are in the open state, it means they can be infected by other nodes. In the protected state, the nodes will never get infected, neither by other nodes nor by itself.
Finally, if the individuals are in the state of being infected, it indicates that the nodes have been affected by other nodes in the area and can pass the infection to other neighboring nodes. The state of the nodes Vi is denoted on the graph as ?(vi). The transmission process in the graph is described by two processes r-reversible and irreversible k-threshold. These processes describe the transitions that take place when a node has infected itself or when it has been infected by other nodes in the some edges.
In the irreversible k-threshold model, the transition takes place when node Vi is infected at a step time t+1. This will only happen if a given number of nodes has been infected and if they will stay in the state of infection until the transition period is over.
The function g becomes g(?(vi)) = ?(vi), and function f is denoted as follows.
The function ?, with respect to Vi=
contaminated if nodes in graph ? are in open state and
?u1; u2 : : : uk ? ?(vi) :
? j ? [1; k]
nodes (uj) in graph ?= infected.
Otherwise, they are not infected and remain as Vi.
The process that takes place in the form above is the same one that takes place in r-reversible, k-threshold process. The node enters an infected state if there are at least k nodes that have been infected in the neighborhood. For the node to return to its open state at a time t, it must be infected first. To accommodate such an action, the state is expanded in order to include open, protected, and infected nodes, so that the immediate state of the infected node could be tracked. The infection propagation in f (f(?(vi)) is given as:
nodes Vi are contaminated in the graph ? = open and
?u1; u2 : : : uk ? _(vi) :
?j ? [1; k]
nodes Uj = infected
Otherwise, they remain as Vi.
The graph of g is a graph of ? with respect to (vi) as follows.
If nodes Vi are in open state, they become infected.
The infection takes place in q+1
and q < r
Reversible process models are used in order to illustrate the infections of nodes which eventually recover from their infections. This is a case where computers have been attacked by viruses the virus scan is run to detect and reverse the infections of the viruses.
One way to control the spread of infection is by protecting some nodes from being infected by others. This strategy will involve making open nodes protected. The other policy is the intervention policy, where nodes are treated after infection has already taken place. Prevention policies are implemented when the intervention time step t is 0. The policy is aimed to prevent the outbreak of the infection from being an epidemic. Intervention policies, on the other hand, contain infections that are already in progress. However, the objectives of both kinds of interventions are to limit the spread of infection in I nodes in the graph. The policies can be defined as A policy ? in graph G (V; E), a function that applies time steps t to a set of nodes ( V.
To control the spread of infection, two questions must be answered.
Question 1: given the initial configuration of the graph where a set of nodes was infected and a number of nodes can be protected, is there an intervention policy that can prevent the spread of infection to more than I nodes?
Question 2: given that a set prevention policy protects nodes from initial infection, does the policy prevention prevent the spread of infection to more than I nodes?
An element that comes from the previous questions is whether the questions can be answered by computing a desired reachability of a given computation of a graph or by computing a graph of non-reachabilities of any undesired configuration from the initial configurations. Model checking can be used in order to verify the reachability or non-reachability of the graph. Given the initial configuration of the graph is where the infection is spreading, a model can be created to check the effectiveness of the policies. The language used is as follows.
1. The nodes of the graph where infection is spreading are the map that aims to indicate variables of the model in the model checker.
2. The transition between initial configuration and transition after infection are propagated as functions of f and g, to check transition by using the model checker.
In the statement 1mentioned above, the configuration of the original graph corresponds to the states of the nodes which are passed to model checker. In the statement 2 mentioned above, there is a one-to-one relationship between the original configuration of the initial graph and states in the input model, as they transit into the model checker. It allows the checker to explore possible transitions in the original graph. This in hand allows the checking of possible spread of infection over the nodes in the initial graph. This model also allows the system to check whether undesired configuration is reached or if all the configurations reached are desired.
Encoding Infection Spread Kripke Structures
The Kripke structure by Clarke is a tuple (S, S0, T, L), where S is a set of states which are described by a set of variables P, S0 ( S is a set of initial states, and T(SXS is a transition where the direct edges of nodes are infected. The graph definition is G (V; E), where V is a set of such nodes as V = fv1; : : : ; vng, where ? is the sates of nodes and infection, and the propagation infection functions are f and g. The Kripke structure KG captures the transitions of all configurations as follows.
1. The valuations of proposition P = f?(vi) j vi ? V g represents states S of KG, where ?(vi) ?( is the states of nodes vi ? V in graph G.
2. In case of two states, the transition relation with T is denoted as following: s; s0 2 S, de_ne (s; s0) 2 T(denoted s ! s0) if s = h_(v1); : : : ; _(vn)i and s0 = hf(g(v1)); : : : f(g(vn))i.
3. The set of states So in the initial step of Kg corresponds to the configurations of the graph G, depending on the query posed by the model checker.
4. The states of the Kripke structure are precisely stated by the states of nodes in graph G.
The path in Kripke structure,((, corresponds to the evolution of the graph G from the original configuration. It is thus an indication that Kripke model can be encoded in Promela, the model language, in order to show the transitions of nodes in an infectious environment.
Finding and Verifying Policies by Using the Linear Temporal Logic (LTL)
As seen from the aforementioned facts, the Kripke structure KG encodes the spread of infection in a graph G (V ;E), identifying and verifying policies that can be simplified by specifying temporal properties in linear temporal logic. To establish whether an evaluation path satisfies a given LTL formula, a certain set of rules is used. The rules are set for convenience in regard to temporal operation functions F and G. The rules assist in getting answers for queries Q1 and Q2 stated earlier.
Q1. Finding an Intervention Policy
According to the query, the set of nodes I(V, which were initially infected, can be protected and pre-specified. The equation for this query specifies the number of nodes infected in a given subset and the specific number of nodes that have to be protected. If the total number of the infected nodes at a given time step is I, the idea is to look for a policy ( that would show how I nodes are infected, and the LTL formula ( that can be used is F (totalL>I). The formula is satisfied by the Kripke structural formula, when all the paths starting from S0 have a state ((L(totalI). In case the formula does not hold to the Kripke formula, it indicates that the original state of the graph does not have a given number of infected nodes.
When the nodes state cannot change from being infected to open, the aforementioned formula can find the intervention policy (, where the nodes undergo changes through g and f functions. In addition, the number of nodes should not be below I. In such a case, the quantity of individuals affected cannot decrease below I. However, when there is r-reversible k-threshold progression, there is a possibility for a node to transform from the state of infection to open state at a given time t, since the g function cannot be described as being monotonic. Being monotonic means that nodes cannot transmit from infected to open state. Therefore, the number of nodes can either be more or less than I in a given step of time.
Q2. Verifying a Preventive Policy
Verification of a policy ( can be achieved through the LTL formula (G(totalI (I). The formula can be satisfied only under condition that totalI ( is true. In case the formula is satisfied, the policy ( started earlier can contain m given number of nodes in regard to a given infection. When such a situation occurs, infection will not affect more than I nodes in a particular time t. Contrary, when the formula does not hold, it is an indication that the starting state to is not present in the formula of Kripke.
When the transmission process in the graph is not monotonic, as given by questions Q1 and Q1, it calls for policymakers to verify if the policy meets the stability of the condition. In this case, the policymakers can verify the policy by using the formula given by LTL. If formula ( is satisfied, it indicates that the policy ( cannot have more than I nodes that are infected in a given step of time t. A path is present when formula ( is unsatisfied, and it requires the policymakers to adjust the policies or to choose an alternative.
Regional Based Propagation Analysis
The model checking approach discussed above helps policymakers identify and verify policies that can accurately and easily test infections and prevent them from spreading. The model, however, is not feasible in every situation. For example, it is not feasible when the policymakers deal with a large population. It is nearly impossible for a computer to display a thousand nodes in a single graph. Also, when dealing with more than 10,000 variables, it is difficult to check if the algorithms are sufficient. Third and last, the model does not give the policymakers a chance to distinguish between the infected and un-infected nodes. For the reasons mentioned above, the policymaker has an option of subdividing the network into small regions. The model of subdividing the network allows the policymakers to focus only on a portion of the network.
Region subdivision for intervention policy poses a major problem in a subdivision process. Though the policymakers have the power to make the divisions as they wish, the automated region generation is the best tool, since it simplifies the process and combines the infections spread. In case of query Q1, if the policymakers want an implement an intervention policy, the regions help to identify the nodes that need to be quarantined. The effectiveness of quarantine depends on how effective policymakers are in subdividing the infected nodes. The region of subdivision is defined as the entity that is infected and needs to be quarantined in order to prevent further infections. Consequently, subdividing regions helps policymakers identify the entities that need to be quarantined.
Results from Identifying Intervention Policies
The Java preprocessor takes network, initial configuration, and policy to be verified as inputs. The inputs are outputted as identified by Kripke structures modeling in Promela, the language of spin. The model is designed in a way that it allows the model checker to explore only the states that satisfy totalI (I. The process conducted by the model checker is able to give the correct results when totalI ( is unsatisfied the indication is that the intervention process does not exist. The network used in the experiment had 40 nodes, which in turn had 80, 70, 60, and 10 edges. 10 nodes in the network were selected randomly to represent nodes that were in the infected state. The nodes protected from the infection reached number 20. All the 10 and 20 nodes infected and protected respectively were tested through query Q1 process.
The results of the experiment (table 1) indicated that it is possible to find intervention measures by using the model process. Within a longer time, the paths of Kripke structure traverse. In addition, a rise in network leads to a rise in edges an increase in number of edges leads to increase of paths of infection and spread. If the model is more interconnected, it increases the chance of the infection, thereby, other nodes are at a risk of being infected as well. From the experiment, by increasing the number of edges between 10 and 20, the number of traverse states decreases to 20, and only then the number starts increasing. Therefore, the number of nodes should not be increased beyond 20. Additionally, the more time is spent in the traverse of individual states, the more chances there are of the infection.
Summary and Conclusion
The paper focused on presenting a practical method of identifying and verifying policies of mitigating the spread of the infection. It also focused on encoding the spread of infection in a network. The spread is encoded in Kripke structure, where the change in configuration state of every network corresponds to the transmission of the Kripke structure. It simplifies the hectic work of identifying policies for intervention, verification, and prevention. It is helpful in the use of the model checking process that aims to find a number of infected nodes and to establish the steps of transmission. Based on the advantages of the model checking process and its ability to identify why some nodes do not satisfy the model, the desired policies were derived.
LTL model checker was used to identify intervention policy, if any, that contains an infection and can verify the prevention policy in case if no more than I nodes are in the network. The method of subdividing a network was considered to be the easier method of managing the spread of infection. The aim of the model is to find regions of infection, thus helping policymakers to contain it. The model checking technique was therefore applicable in identifying and verifying policies in tens of thousands of nodes. In addition, the method allows policymakers to make subdivisions and to apply the appropriate policies to each region, which improves the applicability of the approach.