SELinux Trouble Shooting Tool (setroubleshoot)

Read this white paper on setroubleshoot: setroubleshoot_whitepaper.pdf

Introduction

SELinux is a powerful security technology bringing Mandatory Access Control (MAC) to a mainstream Operating System , however it has often frustrated users leading to a perception it is difficult to use. Here at Red Hat we initiated a project called the SELinux Usability Project to try and make SELinux more friendly. The initial outcome of this work is a tool called "setroubleshoot"

One of the great strengths of SELinux and other MAC architectures is that applications do not have to be modified to be protected by SELinux. This allows us to write policy for a great many services without going through the process of modifying code and getting upstream acceptance. It also allows flexibility in that different vendors or different users can have different security profiles for an application without having to modify the application.

While this is a great benefit to the developers it is not necessarily a great benefit to usability. Since applications do not understand what SELinux is doing, they can not report that SELinux is preventing them from doing something. As an example if you are running an Apache Web Server and SELinux denies access to a file, the apache web server reports permission denied. Users of Unix and other operating systems have gained experience through the years, understand that permission denied means that there is a problem with either the files ownership or file permissions (DAC). But when they go look at the file they see that apache has ownership and can read it. This leads them to scratching their heads. They go back to the log file and all it says is permission denied.

Problem Statement

Our experience with SELinux has shown there are two fundamental areas of difficulty.

Policy Authoring
SELinux Denials, these are reported in the logging system as AVC's with the denied key word. We will

refer to them in the rest of the article as AVC Denials. BTW AVC stands for Access Vector Cache. This is an internal name but a handy way to look for denials in the log files.

We believe of these two items, dealing with AVC Denials was the most important issue to address because AVC Denials are often what frustrates people the most. Policy authoring tends to be limited to a much smaller subset of people (developers, security administrators). There are existing efforts to produce policy authoring tools so there seems little benefit to duplicating this effort. At this time considerable policy has already been written, the amount of new policy authorship is limited. However the existing policy and applications behavior under that policy has not been fully debugged. This leads to AVC Denials which have typically been difficult to diagnose and worse are hidden. This has often lead to the advice "Turn SELinux off" which would not be necessary if the policy and applications were fully debugged. The hidden nature of AVC Denials makes some type of active notification imperative to help fix policy problems. We view this as the preeminent concern with advancing acceptance of the SELinux technology.

Users, system administrators, and developers often run afoul of AVC Denials. When the SELinux policy is fully debugged and properly configured the only AVC Denial which should occur are those triggered by actual security violations. However, because SELinux is still an emerging technology, the policy is still under development, and because people are still learning how to configure SELinux the vast majority of AVC Denials are unintentional as opposed to an actual security violation.

The unintended AVC Denial problem is further complicated by the fact it is often hidden. Frequently the scenario is something does not work which is expected to work. The error reported by the faulting software at best might indicate a permission failure, or worse be silent. The conventional UNIX Discretionary Access Control (DAC) may be properly configured to allow access. Users are trained to look for DAC permission problems, but not SELinux Mandatory Access Control (MAC), thus they are often baffled by the failure. Even if they know to look for AVC Denials the process is not easy. The AVC denial messages are cryptic and figuring out how to fix the problem often involves expertise far beyond most current users, system administrators and developers. We need to make the faulting problem visible and suggest friendly guidance on how interpret the problem and fix it.

Goals

To try and address the problems outlined above we established the following goals to help troubleshoot SELinux problems in a friendly manner:

Alert the user in real time an AVC Denial has occurred
Automatically analyze the AVC Denial to provide a friendly interpretation
- Provide a friendly summary
- Provide a friendly verbose description
- Suggest possible fixes
  - Would configuring SELinux differently help?
  - Is this a known problem?
    - Is a fix available?
    - Has this been reported as a bug?
    - Would you like to view the bug report?
    - Would you like to install an update
    - Would you like to file a bug report?
- Provide a simple "Fix it now" button
- Filter the alerts in a friendly manner
  - Alert me again only after the policy is updated
  - Alert me again only after the problem software has been updated
  - Alert me again only when a fix is available
  - Alert me again only after some time interval has passed
  - Leave me alone, don't alert me about this issue ever.
- Provide different alerting mechanisms
  - Alert via a desktop notification
  - Alert via email
- Support different management models
  - Manage my own node
  - Manage a collection of remote nodes
- Provide different "trust" models
  - I'm paranoid, only root is notified
  - I'm somewhat paranoid, all operations restricted to local node, no communication to lookup known issues, find fixes, etc.
  - I just want things to work, help me as much as you can.
  - The "Fix It" button is disabled or restricted to users with proper authentication
- Allow browsing of previous alerts
  - Browse by category
  - Search
  - Reset filtering on previously viewed alerts.
- Make analysis plugin authoring as simple as possible

Architecture

The setroubleshoot tool is fundamentally divided into two somewhat independent components.

Framework
Analysis plugins

Analysis Plugins

AVC Denials need to be analyzed, or put a different way, interpreted. A plugin architecture is well suited for this task because:

Monolithic analysis is difficult to maintain
The set of AVC's denials and SELinux policy configuration is fluid
Known policy bugs should induce a plugin to recognize it.
The SELinux policy is now modular, analysis should also be modular
Policy authors, and application developers can supply their own analysis plugin
The community can contribute and maintain the analysis plugins
The analysis component can be independently updated.

Framework

The framework provides an environment to host the set of analysis plugins. The framework is responsible for receiving an AVC Denial and passing it to the set of plugins where each plugin is given an opportunity to analyze the AVC Denial. If the plugin recognizes the AVC Denial it informs the framework in a process known as registration.

Registration & Signatures

Once an AVC Denial is recognized and "enters the system" we need a way to refer to this specific issue, not necessarily the AVC, rather what the analysis plugin would like to identify as an "issue". We need to be able to refer to this issue in various locations and at various times, hopefully with an assurance it is unique. The issue might be first discovered on a local node, but might be propagated to a wider audience, e.g. a system administrator managing a collection of nodes, or central repository of known issues such as a bug database (e.g. bugzilla).

Trying to assign a single unique identifier (e.g. GUID) to the issue is fraught with problems. Recall it is a goal this system will work without depending on network access, and most importantly not to depend on network access which would be external to the organization, this restriction is necessary to accommodate security concerns. There is also a timing issue of when and who first identifies the issue, thus who gets to assign it a GUID which other plugins now must use. To address the issues of local only operation and negotiation over who assigns a GUID to the problem we introduce the concept of a "signature", some unique combination of data describing the problem, whose aggregate is portable across all systems.

When a plugin recognizes a problem it would like to report it will then generates a signature composed from various pieces of information available to the plugin. The signature should contain only enough information to make it unique to describe the problem. If the signature is too specific we could have "problem inflation" where essentially the same problem enters the system multiple times.

Signatures are encoded as XML documents making them easy to parse, portable across systems, and amenable to processing by a wealth of XML tools and technologies.

When a plugin recognizes a problem it will not know if this problem has been seen previously. Plugins are completely stateless, the only role a plugin plays is to recognize a problem, and if so to then report the problem along with meta information, such as the friendly descriptions, suggested fixes, etc.

The framework is then is responsible for answering the question "Is this a known problem?" Recall also that question can be asked in the context of different environments, Is it known on my node? Is it known on set of nodes I manage? Is it known globally to a central bug database? etc. Answering the question "Is this problem known?" is performed by doing a lookup on the signature in the environment in question.

Once the plugin recognizes a problem it reports it via its signature to a server it has connected to. The server is responsible for managing the set of signatures (problem reports) and then deciding how to dispose of the problem report.

Use of a server allows for the following important features:

The server can be local only
The connection to the server can be a UNIX domain for enhanced security
The server could be central to a managed set of nodes
The same basic server implementation can be used for a globally central repository to answer the question "Is this problem known to anybody or are there open bug reports".
The server can be connected to by "alert listeners" responsible for notifying users
Alert listening can be decentralized (e.g. pool of nodes managed by a sys admin).
The server cooperates with filtering alerts for alert listeners
The server can receive notifications of new fixes which then triggers alerts to interested listeners
The server provides both local and remote browsing of known issues.
The server can manage plugin updates

When a plugin reports a problem to the framework the process is known as "problem registration". The server then performs a lookup on the signature, if the signature is not present in the servers database it is added to the database along with the "solution" (e.g. the descriptive text, suggested fixes, etc.) reported by the plugin. Metadata is also created for the signature to track various alert listeners interest in the signature and possibly external information such as available updates.

Alert Listener Notification

Upon receiving the signature the server then transmits an alert notification to all its connected listeners. Note, the sever does not perform any alert filtering at this stage, more on this later.

GUI Desktop Notification (sealert)

The GUI desktop notification is /usr/bin/sealert. When sealert starts it contacts the local fault server (setroubleshootd) and performs a logon and registers itself as an alert listener. During the logon the user's identity is passed to the server, this identity is used to associate per user properties (i.e filtering). sealert is initially invisible to the user, but maintains a persistent connection to the server. When the server has an alert to propagate to its listeners it calls the listener passing the problem signature and "solution" information as an "alert", sealert then queues the alert.

Queuing is an important part of the GUI alert management. Queuing is used to handle alerts which fire in rapid succession and to adapt to user filtering preferences, etc. When a new alert arrives the GUI may be in one of several states:

Not visible
Visible, but displaying a previous, but different alert
Visible, but displaying the exact same alert (the same alert is likely to be generated, possibly rapidly in succession, as the same fault problem is triggered repeatedly by the offending software)

The user must be allowed to view and interact with each alert at their own leisure. The GUI cannot replace a previous alert just because a new alert arrived. While the user is interacting with the currently displayed alert she may elect to filter the alert via the GUI. The decision to filter an alert may influence the display of pending alerts in the queue. This is one of the reasons the server does not filter the alerts to a particular alert listener when broadcasting a newly arrived alert. There will always be a window of time between when a user first interacts with a particular alert via the GUI and the arrival and dispatch of subsequent alerts. There is also the opportunity for filtering to be modified externally, e.g. via an alert browsing applet. Therefore the GUI postpones the decision whether to display a particular alert up until the exact moment it's preparing to display it. Filtering options could have applied while the alerts waits in the display queue.

Filtering data is kept on the server, attached to the problem signature, associated with the user. The GUI calls the server passing the problem signature to it and asks it to evaluate the current filtering for the <signature,user> pair. The server responds with the the current disposition of the alert. If it is to be filtered then the GUI removes the alert from the queue and proceeds to the next alert in the queue, if any, otherwise the alert is displayed.

If the alert is to be displayed the following sequence of steps occurs.

If the GUI is currently hidden, then:
- A status icon is displayed, this is minimally invasive, the user is NOT interrupted with an annoying pop-up.
- The appearance of the status icon in the notification area of the panel is subtle, and is easy to miss, thus a notification balloon briefly appears pointing to the status icon. Use of the notification balloon is a configurable option.
- To view the new alert the user must click on the status icon to bring up the GUI, at which time the status icon is removed because the alert has been attended to.
If the GUI is currently displaying an alert, then:
- Because there is a pending unviewed alert the status icon is redisplayed (you have more alerts to view), but the notification balloon is omitted because clearly the user is already interacting with the GUI at this moment, further notification is unwarranted.
- The GUI has a status area at the bottom where is indicates if there are more pending alerts to view along with a "next" button. If there are pending alerts the next button is made active, otherwise it's inactive.
- The user can decide to dismiss the GUI when she is finished viewing the current alert, or she may elect to advance to the next alert via the "next" button.
- As long as there are pending unviewed alerts the status icon remains visible (this is also how one brings up the GUI)
- Balloon notifications are inhibited until every pending alert has been viewed. In other words you only get a balloon notification if there is something new you have not been made aware of yet.
- The queue of pending alerts is modified anytime there is a change in filtering, thus the GUI might indicate you have pending alerts to view, but as soon as you modify your filtering the pending alert status and next button might be updated.

Email Notifications

Currently email notifications are handled by the server directly, this is little point having an independent email alert listener. The list of email notification subscribers is maintained in the setroubleshoot configuration file along with SMTP configuration parameters.

Report Generation

In addition to alert notifications passed to listeners it is possible direct the alert information which would be fed to the listeners to a file or stdout. This is useful for generating reports, preserving history, and is optimal when used with the the log file scanning tool (/usr/sbin/setroubleshoot).

Log File Scanning

In addition to real time listening to audit events the framework has the ability to scan log files and extract information from them and then feed it through the framework. The /usr/sbin/setroubleshoot tool can be invoked with a log file to perform this "static" scanning and analysis.

System Components

Gathering AVC Denial notifications, analyzing them, and then subsequently presenting them to an interested user involves a variety of cooperating components. Each of these components has its own timing, buffering, and permission issues which unfortunately but necessarily makes the overall aggregate system complicated. There are three major independent components:

The audit subsystem
The setroubleshoot server, a.k.a. setroubleshootd
The GUI alert application

audit

It is the audit system which reports AVC and ancillary information. In the past audit only wrote this information into the audit log (sometimes into the system message log). Audit has no knowledge of higher level constructs, think of it only as a logging mechanism. Audit may emit a variety of independent messages which then have to be synthesized into to a single message about a single event, such as an AVC denial. There have been existing tools (e.g. audit2allow, ausearch) which scan log files and synthesize the auditing message, however, these tools are neither friendly nor real time.

When the audit subsystem emits a message it is directed though the audit dispatcher. The audit dispatcher iterates over a set of audit dispatch listeners. Because audit messages are generated in real time by the kernel and the audit system has limited buffering each audit listener must receive and dequeue the message very quickly. setroubleshoot installs an audit dispatch listener which queues each message. During idle it synthesizes the independent audit messages into single messages relevant to a single AVC denial. It then iterates over the set of setroubleshoot analysis plugins and passes the synthesized message to each plugin.

setroubleshootd

This is the "server" or central hub of the setroubleshoot system. It receives connections both from analysis plugins who wish to register the results of their analysis as well as receiving connections from alert listeners who which to be notified of alerts. It also provides an alert browsing service to review previous alerts and to edit per user filtering data (also on a client/server basis). Finally it also provides a mechanism to contact a central repository so more global information can be retrieved or transmitted as a proxy for its clients.

sealert

sealert is the GUI desktop alert listener. It connects to the setroubleshootd server and passively listens for alert notifications from the server. Upon receipt of alert from the server it queues the alert. When it is ready to display the alert it calls the server to evaluate the current filtering on the alert associated with the user. Then if the alert is not filtered it will signal an alert is ready for viewing (via a status icon and possibly balloon notifications). In addition sealert manganese a queue of alerts so that alerts are presenting in an orderly manner without disrupting the users workflow.

Current Implementation Status

setroubleshoot is a Red Hat Emerging Technology product. It is still under active development and is best considered an alpha quality product. It is scheduled to first appear in Fedora Core 6, but it will still be immature. The goal is to reach a robust status for inclusion into the forth coming RHEL5 release. Not all the features mentioned above are currently implemented. There is a modest set of analysis plugins in the current release. The goal is to have an analysis plugin for every SELinux policy boolean (currently over 100). The interface to the analysis plugins is not stable yet (experience with adding new plugins and fleshing out the GUI requirements are teaching us new lessons about the plugin interface as we go)

FAQ

What language is setroubleshoot written in?

python

Do I need to know python to author an analysis plugin?

Not really, although the plugins are also written in python we tried to make them as simple as possible. Reviewing existing plugins should give you pretty strong hints on how to write a new one.

How do I configure the setroubleshoot tool?

There is a configuration file located in /etc/setroubleshoot. Most of the options are documented.

Something is going wrong, how do I track what the tool is doing?

setroubleshoot maintains its own log file under /var/log/setroubleshoot. Logging options can be set in the configuration file.

Where is the local database of problem reports?

The database is currently an XML file and can be found in /var/lib/setroubleshoot/database.xml

Known Issues

Screen Shots

This is what you see on your desktop when an alert fires: setroubleshoot_notification.jpg

This is a browsing tool which allows you to view all the alerts and set options for each, such as whether you want to filter this alert. setroubleshoot_browser.jpg

This is what you see in your email client: setroubleshoot_email.jpg

How can I help?

Any testing will be a huge help and appreciated.
We need folks to help author analysis plugins.
We need review of the plugin messages for clarity, simplicity and accuracy.
We need help completing the framework functionality.

John Dennis <jdennis@…> Dan Walsh <dwalsh@…> Karl MacMillan? <kmacmill@…>