# ANNEX 1: Troubleshooting

## How to know the version of your current OSM installation

Run the following command to know the version of OSM client and OSM NBI:

```bash
osm version
Server version: 17.0.0.post12+g194ced9 2020-04-17
Client version: 17.0.0+geffca72
```

In some circumstances, it could be useful to search the `osm-devops` package installed in your system, since `osm-devops` is the package used to drive installations:

```bash
dpkg -l osm-devops

||/ Name                     Version            Architecture          Description
+++-======================-=================-=====================-=====================================
ii  osm-devops             17.0.0-1          all
```

To know the current verion of the OSM client, you can also search the `python3-osmclient` package as a way to know your current version of OSM:

```bash
dpkg -l python3-osmclient
||/ Name                     Version            Architecture          Description
+++-======================-=================-=====================-=====================================
ii  python3-osmclient      17.0.0-1          all
```

## Logs

### Checking the logs of OSM in Kubernetes

You can check the logs of any container with the following commands:

```bash
kubectl -n osm logs deployment/nbi --all-containers=true
kubectl -n osm logs deployment/lcm --all-containers=true
kubectl -n osm logs deployment/ro --all-containers=true
kubectl -n osm logs deployment/ngui --all-containers=true
kubectl -n osm logs deployment/mon --all-containers=true
kubectl -n osm logs deployment/grafana --all-containers=true
kubectl -n osm logs statefulset/mongodb-k8s --all-containers=true
kubectl -n osm logs statefulset/kafka-controller --all-containers=true
kubectl -n osm logs statefulset/prometheus --all-containers=true
```

For live debugging, the following commands can be useful to save the log output to a file and show it in the screen:

```bash
kubectl -n osm logs -f deployment/nbi --all-containers=true 2>&1 | tee nbi-log.txt
kubectl -n osm logs -f deployment/lcm --all-containers=true 2>&1 | tee lcm-log.txt
kubectl -n osm logs -f deployment/ro --all-containers=true 2>&1 | tee ro-log.txt
kubectl -n osm logs -f deployment/ngui --all-containers=true 2>&1 | tee ngui-log.txt
kubectl -n osm logs -f deployment/mon --all-containers=true 2>&1 | tee mon-log.txt
kubectl -n osm logs -f deployment/grafana --all-containers=true 2>&1 | tee grafana-log.txt
kubectl -n osm logs -f statefulset/mongodb-k8s --all-containers=true 2>&1 | tee mongo-log.txt
kubectl -n osm logs -f statefulset/kafka-controller --all-containers=true 2>&1 | tee kafka-log.txt
kubectl -n osm logs -f statefulset/prometheus --all-containers=true 2>&1 | tee prometheus-log.txt
```

### Changing the log level

You can change the log level of any container, by updating the container with the right `LOG_LEVEL` env var.

Log levels are:

- ERROR
- WARNING
- INFO
- DEBUG

For instance, to set the log level to INFO for the LCM in a deployment of OSM over K8s:

```bash
LOGLEVEL="INFO"
kubectl patch configmap osm-lcm-configmap -n osm --type='merge' -p '{"data":{"OSMLCM_GLOBAL_LOGLEVEL":"'${LOGLEVEL}'"}}'
kubectl get configmap osm-lcm-configmap -n osm -o yaml
kubectl -n osm rollout restart deployment lcm
```

### Debugging Kafka

To connect to Kafka bus and print the received messages:

```bash
kubectl -n osm exec -it kafka-controller-0 -- kafka-console-consumer.sh --bootstrap-server localhost:9092 --whitelist '.*' --formatter kafka.tools.DefaultMessageFormatter --property print.timestamp=true --property print.key=true --property print.value=true
```

### Debugging MongoDB

To connect to MongoDB and run commands:

```bash
kubectl -n osm exec -it pod/mongodb-k8s-0 -- mongosh
```

```mql
use osm;
db.getCollectionNames()
db.k8sclusters.find().pretty()
db.k8sclusters.deleteOne({"_id":"21323ef6-23ec-4f33-8171-dcc863aa9832"})
db.okas.find().pretty()
db.okas.find({}, { _id: 1, name: 1}).pretty()
db.okas.find({}, { _id: 1, name: 1, _admin: {usageState: 1}}).pretty()
db.okas.find({ "_admin.usageState": "IN_USE" }, { _id: 1, name: 1, "_admin.usageState": 1 }).pretty()
db.okas.updateOne(
  { name: "oka_name") }, // Filter: the document to update
  { $set: { field_to_update: "new_value" } } // Update: the field and new new value
)
```

## Troubleshooting installation

### Recommended installation to facilitate troubleshooting

It is highly recommended saving a log of your installation:

```bash
./install_osm.sh 2>&1 | tee osm_install_log.txt
```

### Recommended checks after installation

#### Checking whether all processes/services are running in K8s

```bash
kubectl -n osm get all
```

All the deployments and statefulsets should have 1 replica: 1/1

## How to troubleshoot issues in the new Service Assurance architecture

Since OSM Release FOURTEEN, the Service Assurance architecture is based on Apache Airflow and Prometheus. The Airflow DAGs, in addition to periodically collecting metrics from VIMs and storing them into Prometheus, implement auto-scaling and auto-healing closed-loop operations which are triggered by Prometheus alerts. These alerts are managed by AlertManager and forwarded to Webhook Translator,  which re-formats them to adapt to Airflow expected webhook endpoints. So the alert workflow is this: `DAGs collect metrics => Prometheus => AlertManager => Webhook Translator => Alarm driven DAG`

In case of any kind of error related to monitoring, the first thing to check should be the metrics stored in Prometheus. Its graphical interface can be visited at the URL <http://$IP:9091/>. Some useful metrics to review are the following:

- `ns_topology`: metric generated by a DAG with the current topology (VNFs and NSs) of instantiated VDUs in OSM.
- `vm_status`: status (1: ok, 0: error) of the VMs in the VIMs registered in OSM.
- `vm_status_extended`: metric enriched from the two previous ones, so it includes data about VNF and NS the VM belongs to as part of the metric labels.
- `osm_*`: resource consumption metrics. Only intantiated VNFs that include monitoring parameters have these kind of metrics in Prometheus.

In case you need to debug closed-loop operations you will also need to check the Prometheus alerts here <http://$IP:9091/alerts>. On this page you can see the alerting rules and their status: inactive, pending or active. When a alert is fired (its status changes from pending to active) or is marked as resolved (from active to inactive), the appropriate DAG is run on Airflow. There are three types of alerting rules:

- `vdu_down`: this alert is fired when a VDU remains in a not OK state for several minutes and triggers `alert_vdu` DAG. Its labels include information about NS, VNF, VIM, etc.
- `scalein_*`: these rules manage scale-in operations based on the resource consumption metrics and the number of VDU instances. They trigger `scalein_vdu` DAG.
- `scaleout_*`: these rules manage scale-out operations based on the resource consumption metrics and the number of VDU instances. They trigger `scaleout_vdu` DAG.

Finally, it is also interesting for debugging to be able to view the logs of the execution of the DAGs. To do this, you must visit the Airflow website, which will be accessible on the port pointed by the `airflow-webserver` service in OSM's cluster (not a fixed port):

```bash
kubectl -n osm get svc airflow-webserver
NAME                TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
airflow-webserver   NodePort   10.100.57.168   <none>        8080:19371/TCP   12d
```

When you open the URL <http://$IP:port> (`19371` in the example above) in a browser, you will be prompted for the user and password (`admin`/`admin` by default). After that you will see the dashboard with the list of DAGs:

- `alert_vdu`: it is executed when a VDU down alarm is fired or resolved.
- `scalein_vdu`, `scaleout_vdu`: executed when auto-scaling conditions in a VNF are met.
- `ns_topology`: this DAG is executed periodically for updating the topology metric in Prometheus of the instantiated NS.
- `vim_status_*`: there is one such DAG for each VIM in OSM. It checks VIM's reachability every few minutes.
- `vm_status_vim_*`: these DAGs (one per VIM) get VM status from VIM and store them in Prometheus.
- `vm_metrics_vim_*`: these DAGs (one per VIM) store in Prometheus resource consumption metrics from VIM.

The logs of the executions can be accessed by clicking on the corresponding DAG in dashboard and then selecting the required date and time in the grid. Each DAG has a set of tasks, and each task has its own logs.

## Checking workflows in new OSM declarative framework

Since Release SIXTEEN, operations involve launching an ArgoWorkflows workflow, which will end up with a commit being created in a Git repo.

Be aware that workflows are automatically cleaned up after some time, so the check of the workflows is recommended to be done while the operation is running or a few seconds later.

### How to expose ArgoWorkflows UI

```bash
# Get the kubeconfig and copy to your local machine
# Then, from your local machine
export KUBECONFIG=~/kubeconfig-osm.yaml
kubectl -n argo port-forward deployment/argo-server 2746:2746
```

Access Argo UI from web browser: <https://localhost:2746>. Then click on the workflow, then on the step, then on "Logs".

### How to check a workflow with kubectl

```bash
export KUBECONFIG=~/kubeconfig-osm.yaml
kubectl -n osm-workflows get workflows
kubectl -n osm-workflows get workflows/${WORKFLOW_NAME}
kubectl -n osm-workflows get workflows/${WORKFLOW_NAME} -o json
kubectl -n osm-workflows get workflows/${WORKFLOW_NAME} -o jsonpath='{.status.conditions}' | jq -r '.[] | select(.type=="Completed").status'
watch kubectl -n osm-workflows get workflows
```

### How to check a workflow with argo CLI

```bash
export KUBECONFIG=~/kubeconfig-osm.yaml
argo list -n osm-workflows
argo get -n osm-workflows @latest
argo watch -n osm-workflows @latest
argo logs -n osm-workflows @latest
```

## Checking progress of operations in new OSM declarative framework

### How to check progres of resources in Flux

```bash
export KUBECONFIG=~/kubeconfig-osm.yaml
watch 'echo; kubectl get managed; echo; kubectl get kustomizations -A; echo; kubectl get helmreleases -A'
```

## Common issues with VIMs

### Is the VIM URL reachable and operational?

When there are problems to access the VIM URL, an error message similar to the following is shown after attempts to instantiate network services:

```text
Error: "VIM Exception vimmconnConnectionException ConnectFailure: Unable to establish connection to <URL>"
```

- In order to debug potential issues with the connection, in the case of an OpenStack VIM, you can install the OpenStack client in the OSM VM and run some basic tests. I.e.:

```bash
# Install the OpenStack client
sudo apt-get install python-openstackclient
# Load your OpenStack credentials. For instance, if your credentials are saved in a file named 'myVIM-openrc.sh', you can load them with:
source myVIM-openrc.sh
# Test if the VIM API is operational with a simple command. For instance:
openstack image list
```

If the openstack client works, then make sure that you can reach the VIM from the RO container:

```bash
# If running OSM on top of docker swarm, go to the container in docker swarm
docker exec -it osm_ro.1.xxxxx bash
# If running OSM on top of K8s, go to the RO deployment in kubernetes
kubectl -n osm exec -it deployment/ro bash
curl <URL_CONTROLLER>
```

_In some cases, the errors come from the fact that the VIM was added to OSM using names in the URL that are not Fully Qualified Domain Names (FQDN)._

When adding a VIM to OSM, you must use always FQDN or the IP addresses. Non-FQDN names might be understood by Kubernetes as a container name to be resolved, which is not the case. In addition, all the VIM endpoints should also be FQDN or IP addresses, thus guaranteeing that all subsequent API calls can reach the appropriate endpoint.

### Issues when trying to access VM from OSM

**Is the VIM management network reachable from OSM (e.g. via ssh, port 22)?**

The simplest check would consist on deploying a VM attached to the management network and trying to access it via e.g. ssh from the OSM host.

For instance, in the case of an OpenStack VIM you could try something like this:

```bash
$ openstack server create --image ubuntu --flavor m1.small --nic mgmtnet test
```

If this does not work, typically it is due to one of these issues:

- Security group policy in your VIM is blocking your traffic (contact your admin to fix it)
- IP address space in the management network is not routable from outside (or in the reverse direction, for the ACKs).

## How to report an issue

**If you have bugs or issues to be reported, please use [Bugzilla](https://osm.etsi.org/bugzilla)**

**If you have questions or feedback, feel free to contact us through:**

- **the mailing list [OSM_TECH@list.etsi.org](https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=OSM_TECH@list.etsi.org)**
- **the [Slack work space](https://join.slack.com/t/opensourcemano/shared_invite/enQtMzQ3MzYzNTQ0NDIyLWVkNTE4ZjZjNWI0ZTQyN2VhOTI1MjViMzU1NWYwMWM3ODI4NTQyY2VlODA2ZjczMWIyYTFkZWNiZmFkM2M2ZDk)**

**Please be patient. Answers may take a few days.**

------

Please provide some context to your questions. As an example, find below some guidelines:

- In case of an installation issue:

  - The full command used to run the installer and the full output of the installer (or at least enough context) might help on finding the solution.
- It is highly recommended to run the installer command capturing standard output and standard error, so that you can send them for analysis if needed. E.g.:

```bash
./install_osm.sh 2>&1 | tee osm_install.log
```

- In case of operational issues, the following information might help:

  - Version of OSM that you are using
- Logs of the system. Check <https://osm.etsi.org/wikipub/index.php/Common_issues_and_troubleshooting> to know how to get them.
  - Details on the actions you made to get that error so that we could reproduce it.
  - IP network details in order to help troubleshooting potential network issues. For instance:
    - Client IP address (browser, command line client, etc.) from where you are trying to access OSM
    - IP address of the machine where OSM is running
    - IP addresses of the containers
    - NAT rules in the machine where OSM is running

Common sense applies here, so you don't need to send everything, but just enough information to diagnose the issue and find a proper solution.

## (OLD) Common issues with VCA/Juju

### Juju status shows pending objects after deleting a NS

In extraordinary situations, the output of `juju status` could show pending units that should have been removed when deleting a NS. In those situations, you can clean up VCA by following the procedure below:

```bash
juju status -m <NS_ID>
juju remove-application -m <NS_ID> <application>
juju resolved -m <NS_ID> <unit> --no-retry        # You'll likely have to run it several times, as it will probably have an error in the next queued hook.Once the last hook is marked resolved, the charm will continue its removal
```

The following page also shows [how to remove different Juju objects](https://docs.jujucharms.com/2.1/en/charms-destroy)

### Dump Juju Logs

To dump the Juju debug-logs, run this command:

```bash
juju debug-log --replay --no-tail > juju-debug.log
juju debug-log --replay --no-tail -m <NS_ID>
juju debug-log --replay --no-tail -m <NS_ID> --include <UNIT>
```

### Manual recovery of Juju

If juju gets in a corrupt state and you cannot run `juju status` or contact the juju controller, you might need to remove manually the controller and register again, making OSM aware of the new controller.

```bash
# Stop and delete all juju containers, then unregister the controller
lxc list
lxc stop juju-*          #replace "*" by the right values
lxc delete juju-*        #replace "*" by the right values
juju unregister -y osm

# Create the controller again
sg lxd -c "juju bootstrap --bootstrap-series=xenial localhost osm"

# Get controller IP and update it in relevant OSM env files
controller_ip=$(juju show-controller osm|grep api-endpoints|awk -F\' '{print $2}'|awk -F\: '{print $1}')
sudo sed -i 's/^OSMMON_VCA_HOST.*$/OSMMON_VCA_HOST='$controller_ip'/' /etc/osm/docker/mon.env
sudo sed -i 's/^OSMLCM_VCA_HOST.*$/OSMLCM_VCA_HOST='$controller_ip'/' /etc/osm/docker/lcm.env

#Get juju password and feed it to OSM env files
function parse_juju_password {
   password_file="${HOME}/.local/share/juju/accounts.yaml"
   local controller_name=$1
   local s='[[:space:]]*' w='[a-zA-Z0-9_-]*' fs=$(echo @|tr @ '\034')
   sed -ne "s|^\($s\):|\1|" \
        -e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" $password_file |
   awk -F$fs -v controller=$controller_name '{
      indent = length($1)/2;
      vname[indent] = $2;
      for (i in vname) {if (i > indent) {delete vname[i]}}
      if (length($3) > 0) {
         vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
         if (match(vn,controller) && match($2,"password")) {
             printf("%s",$3);
         }
      }
   }'
}
juju_password=$(parse_juju_password osm)
sudo sed -i 's/^OSMMON_VCA_SECRET.*$/OSMMON_VCA_SECRET='$juju_password'/' /etc/osm/docker/mon.env
sudo sed -i 's/^OSMLCM_VCA_SECRET.*$/OSMLCM_VCA_SECRET='$juju_password'/' /etc/osm/docker/lcm.env

juju_pubkey=$(cat $HOME/.local/share/juju/ssh/juju_id_rsa.pub)
sudo sed -i 's/^OSMLCM_VCA_PUBKEY.*$/OSMLCM_VCA_PUBKEY='$juju_pubkey'/' /etc/osm/docker/mon.env
sudo sed -i 's/^OSMLCM_VCA_PUBKEY.*$/OSMLCM_VCA_PUBKEY='$juju_pubkey'/' /etc/osm/docker/lcm.env

#Restart OSM stack
docker stack rm osm
docker stack deploy -c /etc/osm/docker/docker-compose.yaml osm
```

### Slow deployment of charms

You can make deployment of charms quicker by:

- Upgrading your LXD installation to use ZFS:LXD configuration for OSM Release FIVE
  - After LXD re-installation, you might need to reinstall the juju controller: [Reinstall Juju controller](#manual-recovery-of-juju)
- Preventing Juju from running `apt-get update && apt-get upgrade` when starting a machine: [Disable OS upgrades in charms](14-advanced-charm-development.md#disable-os-upgrades)
- Building periodically a custom image that will be used as base image for all the charms: [Custom base image for charms](14-advanced-charm-development.md#build-a-custom-cloud-image)

## Other operational issues

### Running out of disk space

If you are upgrading frequently your OSM installation, you might face that your disk is running out of space. The reason is that the previous dockers and docker images might be consuming some disk space. Running the following two commands should be enough to clear your docker setup:

```bash
docker system prune
docker image prune
```