Citadel — Automated Abuse Detection for Email

Overview

Citadel is a Chrome extension for Gmail that detects abusive emails and helps users take action while preserving privacy. The design is grounded in prior literature and a mixed-methods study (interviews + survey) examining user preferences for moderation approaches and privacy concerns regarding human vs. automated systems.

Platform: Chrome extension for Gmail, permission-based install; no storage of personal emails on our servers.

Control: Threshold-per-sender, block lists, and “accurate / inaccurate” feedback loops.

Support: Notification window + “Contact Trusted People” for quick outreach after detection.

Key Findings from Prior & Recent Studies

Preferences for moderation approaches and privacy concerns informed Citadel’s design.

Interview Study Preferences

40% preferred automated moderation, 53% preferred a combined automated + human approach, and 7% preferred human-only moderation. A chi-square test indicated significant differences in privacy concern between human-only and automated conditions.

Survey Preferences

51% preferred combined moderation, 41% automated, and 8% human-only—mirroring interview trends with strong privacy concerns tied to human moderation.

Platform Usage

Gmail was the primary email platform for participants (≈93% in interviews; ≈89% in survey), motivating a Chrome extension delivery.

Feature Identification

Show in Notification Window

Abusive emails surface in an accessible notification pane with severity indicators, keeping the inbox uncluttered.

Contact Trusted People

One-click outreach to pre-selected contacts (friends/relatives) when support is needed after detection.

Threshold per Sender

Users tune tolerance levels per sender; feedback (“Accurate?” / “Inaccurate?”) adaptively adjusts thresholds over time.

Block Contact

Instantly block a sender; future emails are moved to trash and kept out of the inbox. Manage block lists in the settings view.

Design Iterations

Citadel went through two iterations. The first design validated demand for notifications and trusted contacts. The second introduced sender-specific thresholds and blocking, based on insights that some users tolerate language from friends but not strangers.

System flow and notification window, first design — First design: flow & UI.

System flow and notification window, second design — Second design: sender thresholds & UI.

Implementation

Backend

Engine

Deep neural network (CNN + LSTM + dense layers) detects abusive content. A secure database stores user profile & trusted contact metadata—never personal emails.

Data

Training

Due to privacy constraints around email storage, training used the Kaggle Jigsaw toxic comment dataset (159,571 items; labels: toxic, severe toxic, obscene, threat, insult, identity hate).

Model

Architecture

Inception-inspired convolutional branch + LSTM for long-range dependencies; character- and word-level inputs to handle context and misspellings. Test accuracy: ~98.4% on a 9,571-item split.

Frontend

Chrome Extension

Works within Gmail (with user permission). Shows notifications with severity, “Ask for help!”, “Block Contact”, and sender-threshold feedback controls.

Privacy & User Control

No email storage: Personal emails are not saved on our servers; models run on extracted text features only for classification.
Adaptive thresholds: “Accurate / Inaccurate” feedback tunes per-sender sensitivity without dropping below a safe-minimum.
Block management: Users can add/remove blocked contacts; messages from blocked senders are trashed automatically.
Trusted contacts: Quick “Ask for help!” outreach after detection.