In [119]:

```
from adsy.display import extended_styles
extended_styles()
```

Out[119]:

Ultimately, we would like to model a within-perimeter attack and detection scenario on a large cyber network. We will be using recurrent (LSTM) neural networks to solve for both the attacker and defender strategies. As we know, computational "gotchyas" can arise in many parts of this model. Therefore, we are going to build the model piece by piece. The plan of attack is as follows:

**1.** In a two node network, solve for optimal defender strategey for a fixed attacker strategy. (Done)

**2.** In a two node network, solve for optimal attacker strategy for a fixed defender strategy. (Done)

**3.** Using fictious play, solve for equilibrium in a two node network. (Done)

**4.** In a multi-node network, solve for optimal defender strategy for a fixed attacker strategy

**5.** In a multi-node network, solve for optimal attacker strategy for a fixed defender strategy

**6.** In a multi-node network, solve for equilibrium using fictious play or some other learning algo...

**7.** Scale, scale scale scale

In this model, there are only two nodes. Call them A and B. Node A sends benign messages to B where inter-message times are distirbuted exponentially with rate $\lambda_b$. The scenario starts at time $T=0$ without an attacker present in the network. An attacker enters the network at a time determined by an exponential random variable with rate $\lambda_e$. When the attacker enter's the network, it sends malicious messages at rate $\lambda_m$.

The defender continually accumulates a reward of $r$ per unit time the network is running and incurs a cost of $c_m$ every time the attacker sends a message. All costs and rewards are discounted at a rate of $\delta$.

Everytime a message goes from A to B, the defender chooses to either keep the network running or sound an alarm and shut down the network. Effectively, the defender wants to keep the network running as long as possible (so that is accumulates $r$) but shut down the network as soon as the attacker enters.

First, let's look and see the performance as we sweep through various parameters for the defender

In [1]:

```
import plot_tools as pt
%matplotlib inline
r= pt.Results()
fig, ax = r.plot_convergence()
```

Before looking at the results, note that the experiment I did was vary $\lambda_m$ and $c_m$ but held the ratio $\lambda_m/c_m$ constant. Intuitively, this varies the attacker strategy that sends relatively unharmful messages very fast versus sending very harmful messages very slowly.

The figure shows that the algorithm learns rather quickly (convergence at about epoch 75) for each parameter specification. It is easy to verify that the top 3 lines are indeed the optimum for the parameter specification. Although it is not as clear for the other parameter specifications, we can check some other plots to verify our intuition.

First, let's look at how the false alarm rate varies with changes in the parameters

In [2]:

```
fig, ax = r.plot_false_alarms()
```

This is exactly as expected. When the malicious messages are very frequent but essentially costless, the defender wants to make sure that the attacker is present in the network before sounding a false alarm. On the other hand, when malicious messages are rare but relatively more costly, the defender is willing to sound an alarm with less certaintly regarding the presence of an attacker.

For yet another sanity check, let's see how well the defender would do if it did not take into account any of the information and just shutdown the network at a fixed time.

In [6]:

```
import two_node_proto as proto
import numpy as np
parameters = proto.parameters
parameters["lm"] = 1.25
parameters["cm"] = 1.2
defender = proto.Defender(parameters)
eus = []
for e in np.arange(.25,20,.5):
eus.append(defender.expected_reward_at_time(e, 10000))
```

In [24]:

```
from matplotlib import pyplot as plt
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(np.arange(.25,20,.5), eus, label="Expected Utility at Fixed Alarm Time")
ax.set_xlabel("Stopping Time")
ax.set_ylabel("Expected Utility")
ax.plot([0,20],[r.results[3].Rewards.values[-1]]*2, label="Expected Utility Under LSTM Policy")
ax.legend(loc=4);
```

In [25]:

```
import two_node_proto as proto
import numpy as np
parameters = proto.parameters
parameters["lm"] = .3125
parameters["cm"] = 4.8
defender = proto.Defender(parameters)
eus2 = []
for e in np.arange(.25,20,.5):
eus2.append(defender.expected_reward_at_time(e, 10000))
```

In [28]:

```
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(np.arange(.25,20,.5), eus2, label="Expected Utility at Fixed Alarm Time")
ax.set_xlabel("Stopping Time")
ax.set_ylabel("Expected Utility")
ax.plot([0,20],[r.results[5].Rewards.values[-1]]*2, label="Expected Utility Under LSTM Policy")
ax.legend(loc=4);
```

In this experiment, we will train an attacker against a fixed defender strategy. To do so, we first have to specify what the defender's fixed strategy should be. In this example, we fix the defender's strategy to be the best response to an attacker that has an opportunity to and indeed does send a malicious message at a rate of $\lambda_m=5$. The defender incurs a cost of $c_m=.5$ per malicious message.

Under this fixed attacker strategy, on average the attacker earns a reward of $4$. That is, the attacker sends---on average--- 4 messages before it is detected. The goal is to see if we can now train an attacker to achieve a reward higher than this. Presumably, if the attacker were to stagger its messages instead of sending them at every opportunity, it would be able to "hide" in the normal network activity better. Let's see if that is that case.

In [27]:

```
import pandas as pd
df = pd.read_csv("./results/a1/attacker1.csv", index_col=0)
df.columns = ["Expected Reward", "Time Between Messages"]
ax1, ax2 = list(df.plot(subplots=True, figsize=(14,6)))
ax2.set_xlabel("Epoch")
ax1.set_ylabel("Expected Reward")
ax2.set_ylabel("Average Time Between Attacker Messages")
```

Out[27]:

In [29]:

```
df = pd.read_csv("./results/a2/attacker2.csv", index_col=0)
df.columns = ["Expected Reward", "Time Between Messages"]
ax1, ax2 = list(df.plot(subplots=True, figsize=(14,6)))
ax2.set_xlabel("Epoch")
ax1.set_ylabel("Expected Reward")
ax2.set_ylabel("Average Time Between Attacker Messages")
```

Out[29]:

In [2]:

```
#df = pd.read_csv("./results/a3/attacker3.csv", index_col=0)
#df.columns = ["Expected Reward", "Time Between Messages"]
#ax1, ax2 = list(df.plot(subplots=True, figsize=(14,6)))
#ax2.set_xlabel("Epoch")
#ax1.set_ylabel("Expected Reward")
#ax2.set_ylabel("Average Time Between Attacker Messages")
```

In [5]:

```
from results import *
res = ReadResults("./fpboth/results/exp2/")
```

In [38]:

```
%matplotlib inline
ax = (res.falsealarms / 600.).plot()
ax.legend(ncol=4, loc=0)
ax.set_title("False Alarm Rate for various FP iterations")
ax.set_xlabel("Training Epoch (/20)")
ax.set_ylabel("False Alarm Rate");
```

In [34]:

```
from matplotlib import pyplot as plt
fig, ax = plt.subplots()
ax.plot(res.falsealarms.tail(1).mean().values.flatten()/600.)
ax.set_xlabel("Final Fictious Play Iteration")
ax.set_ylabel("False Alarm Rate");
```

In [37]:

```
%matplotlib inline
ax = res.dus.plot()
ax.legend(ncol=4, loc=0)
ax.set_title("Defender Expected Utility for various FP iterations")
ax.set_xlabel("Training Epoch (/20)")
ax.set_ylabel("Expected Utility");
```

We can also look at the attacker

In [63]:

```
ax = res.atimebetween.plot()
ax.legend(ncol=4, loc=0)
ax.set_title("Average Time Between Messages for various FP iterations")
ax.set_xlabel("Training Epoch (/20)")
ax.set_ylabel("Time Between Messages");
```

In [83]:

```
fig, ax = plt.subplots()
ax.plot(res.atimebetween.tail(1).values.flatten())
ax.set_xlabel("Final Fictious Play Iteration")
ax.set_ylabel("Time Between Messages");
```

In [97]:

```
ax = res.aus.plot()
ax.legend(ncol=5, loc=4)
ax.set_title("Attacker's Expected Utility for various FP iterations")
ax.set_xlabel("Training Epoch (/20)")
ax.set_ylabel("Expected Utility");
fig = plt.gcf()
fig.set_size_inches(8.5, 6.5)
```