Incident Response Plan

Incident Response Overview

An effective incident response plan minimizes damage, reduces recovery time and costs, and ensures lessons learned improve future security posture.

Incident Response Phases

Preparation: Establish and train incident response team
Detection & Analysis: Identify and validate incidents
Containment: Limit damage and prevent spread
Eradication: Remove threat from environment
Recovery: Restore normal operations
Post-Incident: Document lessons learned

Incident Response Team Structure

Core Team Roles

Role	Responsibilities	Contact
Incident Commander	Overall incident coordination, external communication	Primary: CTO Backup: VP Engineering
Security Lead	Technical investigation, forensics	Primary: Security Manager Backup: Sr. Security Engineer
Operations Lead	System remediation, recovery	Primary: DevOps Manager Backup: Sr. SRE
Communications Lead	Internal/external communications	Primary: PR Manager Backup: Marketing Director
Legal Advisor	Legal guidance, compliance	Primary: General Counsel Backup: External Counsel

Escalation Matrix

Severity 1 (Critical): Immediate - All hands on deck
Severity 2 (High): Within 30 minutes - Core team
Severity 3 (Medium): Within 2 hours - Security team
Severity 4 (Low): Next business day - On-call engineer

Detection and Analysis

Incident Classification

Type	Indicators	Initial Response
Malware	AV alerts, unusual processes, C2 traffic	Isolate system, capture memory
Data Breach	Large data transfers, DB queries, access anomalies	Block egress, audit access
Account Compromise	Failed logins, privilege escalation, unusual access	Disable account, reset credentials
DDoS Attack	Traffic spike, service degradation	Enable DDoS mitigation, scale resources
Ransomware	Encryption activity, ransom notes	Disconnect systems, activate DR

Incident Severity Determination

# Severity calculation matrix
def calculate_severity(incident):
    severity_score = 0
    
    # Data classification (0-3 points)
    if incident.data_classification == 'restricted':
        severity_score += 3
    elif incident.data_classification == 'confidential':
        severity_score += 2
    elif incident.data_classification == 'internal':
        severity_score += 1
    
    # Scope (0-3 points)
    if incident.systems_affected > 100:
        severity_score += 3
    elif incident.systems_affected > 10:
        severity_score += 2
    elif incident.systems_affected > 1:
        severity_score += 1
    
    # Business impact (0-4 points)
    if incident.revenue_impact or incident.compliance_violation:
        severity_score += 4
    elif incident.productivity_impact > 50:
        severity_score += 3
    elif incident.productivity_impact > 10:
        severity_score += 2
    elif incident.productivity_impact > 0:
        severity_score += 1
    
    # Determine severity level
    if severity_score >= 8:
        return 'CRITICAL'
    elif severity_score >= 6:
        return 'HIGH'
    elif severity_score >= 3:
        return 'MEDIUM'
    else:
        return 'LOW'

Containment Strategies

Immediate Containment

#!/bin/bash
# Emergency containment script

# Function to isolate compromised system
isolate_system() {
    local host=$1
    echo "Isolating system: $host"
    
    # Network isolation
    ssh $host "sudo iptables -P INPUT DROP"
    ssh $host "sudo iptables -P OUTPUT DROP"
    ssh $host "sudo iptables -A INPUT -i lo -j ACCEPT"
    ssh $host "sudo iptables -A OUTPUT -o lo -j ACCEPT"
    
    # Preserve evidence
    ssh $host "sudo dd if=/dev/mem of=/tmp/memory.dump"
    ssh $host "sudo netstat -anp > /tmp/network_connections.txt"
    ssh $host "sudo ps aux > /tmp/processes.txt"
    
    # Disable remote access
    aws ec2 modify-instance-attribute --instance-id $(get_instance_id $host) \
        --no-source-dest-check
}

# Function to block malicious IPs
block_malicious_ip() {
    local ip=$1
    echo "Blocking IP: $ip"
    
    # Update WAF rules
    aws wafv2 update-ip-set --scope REGIONAL \
        --id $WAF_IPSET_ID \
        --addresses $ip/32
    
    # Update security groups
    aws ec2 revoke-security-group-ingress \
        --group-id $SG_ID \
        --ip-permissions "IpProtocol=-1,FromPort=-1,ToPort=-1,IpRanges=[{CidrIp=$ip/32}]"
}

Long-term Containment

Implement additional monitoring on affected systems
Deploy honeypots to detect lateral movement
Increase logging verbosity
Implement temporary access restrictions
Enable enhanced authentication requirements

Eradication Procedures

Malware Removal

Identify all infected systems through IoC scanning
Remove malware files and registry entries
Patch vulnerabilities exploited by malware
Reset all potentially compromised credentials
Verify complete removal through multiple scans

Account Compromise Response

# Account compromise remediation
class AccountRemediation:
    def __init__(self):
        self.audit_log = AuditLogger()
        
    def remediate_compromised_account(self, username):
        """Complete remediation for compromised account"""
        # Disable account immediately
        self.disable_account(username)
        
        # Terminate all active sessions
        sessions = self.get_active_sessions(username)
        for session in sessions:
            self.terminate_session(session)
        
        # Reset credentials
        temp_password = self.generate_secure_password()
        self.reset_password(username, temp_password)
        
        # Revoke all tokens and keys
        self.revoke_api_keys(username)
        self.revoke_oauth_tokens(username)
        
        # Audit recent activity
        activities = self.audit_account_activity(username, days=30)
        
        # Check for persistence mechanisms
        self.check_scheduled_tasks(username)
        self.check_startup_items(username)
        self.check_ssh_keys(username)
        
        # Log remediation actions
        self.audit_log.log_remediation(username, activities)
        
        return {
            'status': 'remediated',
            'actions_taken': self.get_remediation_summary(username),
            'requires_review': activities
        }

Recovery Procedures

System Recovery Checklist

□ Verify threat completely removed
□ Restore from clean backups if needed
□ Apply all security patches
□ Harden system configuration
□ Test functionality thoroughly
□ Monitor closely for 48 hours
□ Gradually restore normal access

Service Restoration Priority

Tier 1: Authentication services, core infrastructure
Tier 2: Customer-facing applications
Tier 3: Internal tools and services
Tier 4: Development and test environments

Communication Templates

Internal Communication

Subject: [SEVERITY] Security Incident - [INCIDENT ID]

Team,

We have detected a security incident requiring immediate attention.

Incident Type: [TYPE]
Severity: [CRITICAL/HIGH/MEDIUM/LOW]
Affected Systems: [SYSTEMS]
Current Status: [INVESTIGATING/CONTAINED/RESOLVED]

Immediate Actions Required:
- [ACTION 1]
- [ACTION 2]

Do not discuss this incident outside of authorized channels.

Incident Commander: [NAME]
Bridge Line: [PHONE]
Slack Channel: #incident-[ID]

Customer Communication

Subject: Important Security Update

Dear Customer,

We recently detected [general description] affecting [scope].

What Happened:
[Brief, factual description without technical details]

What Information Was Involved:
[Specific data types if any]

What We Are Doing:
[List of actions taken]

What You Should Do:
[Specific customer actions if needed]

For More Information:
[Contact information]

We take security seriously and apologize for any inconvenience.

Sincerely,
[Company Leadership]

Evidence Collection

Forensic Data Collection

#!/bin/bash
# Forensic evidence collection script

CASE_ID=$1
EVIDENCE_DIR="/forensics/$CASE_ID"
mkdir -p $EVIDENCE_DIR

# Collect system information
echo "Collecting system information..."
date > $EVIDENCE_DIR/collection_time.txt
uname -a > $EVIDENCE_DIR/system_info.txt
uptime > $EVIDENCE_DIR/uptime.txt

# Collect memory dump
echo "Dumping memory..."
sudo dd if=/dev/mem of=$EVIDENCE_DIR/memory.dump bs=1M

# Collect network information
echo "Collecting network data..."
netstat -antp > $EVIDENCE_DIR/network_connections.txt
iptables -L -n -v > $EVIDENCE_DIR/firewall_rules.txt
ss -tulpn > $EVIDENCE_DIR/listening_ports.txt
arp -a > $EVIDENCE_DIR/arp_cache.txt

# Collect process information
echo "Collecting process data..."
ps auxww > $EVIDENCE_DIR/processes.txt
lsof -n > $EVIDENCE_DIR/open_files.txt
pstree -p > $EVIDENCE_DIR/process_tree.txt

# Collect user information
echo "Collecting user data..."
w > $EVIDENCE_DIR/logged_in_users.txt
last -50 > $EVIDENCE_DIR/login_history.txt
cat /etc/passwd > $EVIDENCE_DIR/users.txt

# Create hash of evidence
echo "Creating evidence hash..."
find $EVIDENCE_DIR -type f -exec sha256sum {} \; > $EVIDENCE_DIR/evidence_hashes.txt

echo "Evidence collection complete: $EVIDENCE_DIR"

Chain of Custody

Document who collected evidence and when
Use write-once media when possible
Calculate cryptographic hashes
Maintain access logs for evidence
Store in secure, tamper-evident location

Post-Incident Activities

Lessons Learned Meeting

Conduct within 5 business days of incident closure:

What went well?
What could be improved?
Were procedures followed?
Were there any tool/process gaps?
What preventive measures are needed?

Post-Incident Report Template

Executive Summary
Incident Timeline
Root Cause Analysis
Impact Assessment
Response Effectiveness
Recommendations
Action Items

Incident Response Metrics

Key Performance Indicators

Metric	Target	Measurement
Mean Time to Detect (MTTD)	< 1 hour	Time from compromise to detection
Mean Time to Respond (MTTR)	< 2 hours	Time from detection to containment
Mean Time to Recovery	< 4 hours	Time from containment to recovery
Incident Recurrence Rate	< 5%	Repeat incidents within 90 days

Testing and Maintenance

Tabletop Exercises

Quarterly scenario-based discussions
Annual full-scale simulation
Monthly tool and contact verification
Post-exercise improvement implementation

Plan Maintenance

Review and update quarterly
Update after each incident
Annual comprehensive review
Track and implement improvements

Quick Reference

Emergency Contacts

Security Hotline: +1-800-SECURE1
Executive Escalation: +1-800-EXEC911
Legal Team: [email protected]
PR Team: [email protected]

Critical Commands

# Block IP immediately
iptables -A INPUT -s [IP] -j DROP

# Disable user account
usermod -L [username]

# Kill all user processes
pkill -u [username]

# Capture network traffic
tcpdump -i any -w /tmp/capture.pcap

# Check for rootkits
rkhunter --check
chkrootkit

Incident Response Overview

Incident Response Phases

Incident Response Team Structure

Core Team Roles

Escalation Matrix

Detection and Analysis

Incident Classification

Incident Severity Determination

Containment Strategies

Immediate Containment

Long-term Containment

Eradication Procedures

Malware Removal

Account Compromise Response

Recovery Procedures

System Recovery Checklist

Service Restoration Priority

Communication Templates

Internal Communication

Customer Communication

Evidence Collection

Forensic Data Collection

Chain of Custody

Post-Incident Activities

Lessons Learned Meeting

Post-Incident Report Template

Incident Response Metrics

Key Performance Indicators

Testing and Maintenance

Tabletop Exercises

Plan Maintenance

Quick Reference

Emergency Contacts

Critical Commands

Related Resources

Related Documentation

Zero Trust Framework

Security Best Practices

Zero Trust Implementation