AutoResponderProcess — Output Files Data Dictionary#

See repo source for current behavior (ReportGenerator._CSV_COLUMNS / _CSV_COLUMNS_IP4, output_document.py).

This document provides a comprehensive reference for every output file produced by the AutoResponderProcess pipeline. Each file is described with its purpose, generation conditions, format, and a detailed explanation of every column, field, or structural element it contains.

All output files are written to a timestamped run directory:

processing_reports/run_{YYYY-MM-DD_HH-MM-SS}/

Table of Contents#

run.log
stage_execution.log
processing_report.log
processing_report.json
processing_report_master.csv / processing_report_ip4.csv
category_summary_report.csv
category_summary_report.json
classifier_output/classifier_output.json
classifier_output/classifier_output.csv
output_document_inactive_people.csv
output_document_inactive_people.json
Marketing suppression deliverable ({BusinessUnit}_NoLongerThere_{date}.csv)
output_document_alternate_contacts.csv
output_document_alternate_contacts.json
output_document_inactive_new_org.csv
output_document_inactive_new_org.json
output_document_undeliverables.csv
output_document_undeliverables.json
output_document_inactive_no_cupola_match.csv
output_document_inactive_no_cupola_match.json
cupola_audit_log.csv
cupola_audit_log.json
cupola_audit_log_rollback_plan.csv
output_document_multipub_audit.csv
output_document_multipub_audit.json
output_document_email_update_requests.csv
output_document_email_update_requests.json
action_log.log
batch_report.html
batch_report.pptx
output_document_human_review.csv / .json
impact_report.txt / .json

1. `run.log`#

Purpose#

The primary runtime log file for the entire pipeline execution. Captures every log message emitted by any Python logger during the run at the DEBUG level and above. This is the most granular diagnostic artifact and is the first place to look when troubleshooting unexpected behavior, errors, or performance issues.

Generation Conditions#

Always generated. Created at the start of every run via setup_logging() in logging_config.py.

Format#

Plain text. Each line follows the enhanced logging format:

{timestamp} - [{correlation_id}] - {logger_name} - {level} - {message}

Field Descriptions#

Field	Description
timestamp	The date and time the log entry was recorded, formatted as `YYYY-MM-DD HH:MM:SS` in US Eastern Time (America/New_York). All timestamps throughout the application are normalized to Eastern Time for consistency.
correlation_id	An 8-character UUID prefix that uniquely identifies a logical unit of work (typically one email being processed). This allows you to trace all log messages related to a single email across multiple modules and subsystems. Displays `N/A` when no correlation context is active (e.g., during initialization).
logger_name	The fully qualified Python module name that emitted the log entry (e.g., `auto_responder.connectors.cupola_connector`). This tells you exactly which component of the system generated the message.
level	The severity level of the log entry. In the file handler, all levels from `DEBUG` upward are captured. Possible values: `DEBUG` (detailed diagnostic information), `INFO` (general operational messages), `WARNING` (unexpected but recoverable situations, including slow-operation alerts for functions exceeding 1000ms), `ERROR` (failures that prevented an operation from completing), `CRITICAL` (severe failures that may halt the entire run).
message	The free-form log message content. May include structured data such as email addresses, contact IDs, system names, operation results, timing information, and error tracebacks. For operations decorated with `@log_performance`, a `[duration=X.XXms]` suffix is appended when the function completes.

Notes#

The file handler is set to DEBUG level, which is more verbose than the console handler (set to INFO). This means the file will contain detailed diagnostic information not shown in the terminal.
Noisy third-party libraries (httpx, urllib3, msal) are suppressed to WARNING level to keep the log focused on application-level events.

2. `stage_execution.log`#

Purpose#

A structured, stage-by-stage execution log that tracks the pipeline's progression through its major processing phases. Unlike run.log which captures every message, this file is organized into discrete stage sections with JSON-encoded data blocks, making it ideal for programmatic post-run analysis and pipeline health monitoring.

Generation Conditions#

Always generated. Created at run start by the StageLogger class. The final summary section is written when the pipeline completes (normal or early exit).

Format#

Plain text with embedded JSON blocks. The file is divided into:

A header section
One section per pipeline stage
A final summary section

Structure#

====================================================================================================
AUTORESPONDER PROCESS - STAGE EXECUTION LOG
Run Started: {ISO 8601 timestamp}
====================================================================================================

Per-Stage Section#

Each stage that executes during the pipeline gets its own section:

====================================================================================================
STAGE: {stage_name}
Timestamp: {ISO 8601 timestamp}
----------------------------------------------------------------------------------------------------
STAGE_DATA (JSON):
{JSON object with stage metadata, timing, and data}

SUMMARY:
  Duration: {X.XX}ms ({X.XX}s)
  Status: {completed|failed|skipped}
  Emails Processed: {count}      (if applicable)
  Error: {error message}          (if applicable)
  Details: {JSON details}         (if applicable)
====================================================================================================

STAGE_DATA JSON Fields#

Field	Type	Description
stage_name	string	The internal name of the pipeline stage (e.g., `STEP_1_EXTRACT_EMAILS`, `STEP_2_CONTACT_LOOKUP`, `STEP_3_CLASSIFY`, `STEP_4_DETERMINE`, `STEP_5_EXECUTE_ACTIONS`, `STEP_6_GENERATE_REPORTS`). Identifies which phase of the processing pipeline this section documents.
start_time	string (ISO 8601)	The exact timestamp when this stage began execution, in Eastern Time. Used together with `end_time` to compute the stage's wall-clock duration.
end_time	string (ISO 8601)	The exact timestamp when this stage completed execution.
duration_ms	float	The elapsed wall-clock time for the stage in milliseconds. Computed as the difference between `end_time` and `start_time`. Useful for identifying performance bottlenecks — for example, a slow LLM classification stage or a slow database lookup.
status	string	The outcome of the stage. `completed` means the stage finished without fatal errors. `failed` means the stage encountered an unrecoverable error. `skipped` means the stage was intentionally bypassed (e.g., no emails to process).
metadata	object	Additional key-value pairs provided when the stage was started. Content varies by stage and may include configuration parameters, input counts, or other contextual information.
data	object	Arbitrary structured data logged during stage execution via `log_stage_data()`. Each key represents a named data point; the value can be any JSON-serializable structure. Examples include email counts, lookup results, classification summaries, or action execution details.
errors	array	List of error objects recorded during the stage. Each error object contains: `error` (the error message string), `error_type` (the Python exception class name, e.g., `ConnectionError`, `ValueError`), `context` (additional key-value context about the error), and `timestamp` (when the error occurred).
warnings	array	List of warning objects recorded during the stage. Each warning object contains: `warning` (the warning message string), `context` (additional key-value context), and `timestamp` (when the warning was recorded). Warnings indicate non-fatal issues that may merit attention but did not prevent stage completion.

Final Summary Section#

FINAL SUMMARY

Contains a SUMMARY_DATA (JSON) block and a HUMAN-READABLE SUMMARY.

Field	Type	Description
run_start_time	string (ISO 8601)	The timestamp when the entire pipeline run began.
run_end_time	string (ISO 8601)	The timestamp when the pipeline run completed.
total_duration_ms	float	Total wall-clock time for the entire run in milliseconds.
total_duration_seconds	float	Total wall-clock time in seconds (convenience field).
stages_completed	integer	The number of stages that were executed during the run.
statistics	object	Aggregated statistics across all stages. Contains: `total_emails_extracted` (number of emails pulled from the inbox), `unique_emails` (number of deduplicated emails), `emails_processed` (number of emails that went through the full pipeline), `determinations` (a dictionary mapping each determination type to its count), `errors` (array of all errors across all stages), `warnings` (array of all warnings across all stages).
stage_summaries	array	A compact array summarizing each stage. Each entry contains `stage_name`, `status`, and `duration_ms`. This provides a quick-glance view of which stages ran and how long each took.

3. `processing_report.log`#

Purpose#

The primary human-readable processing report. Provides a comprehensive, formatted text summary of every email that was processed, including the contact lookup results, LLM classification, determination, Multipub validation, standard actions, executed actions, and final outcome. This is the main report for operational review of a batch run.

Generation Conditions#

Generated when at least one email is processed. Not generated if the pipeline finds zero emails in Step 1 (early exit).

Format#

Plain text, structured with fixed-width formatting and separator lines. The report has four major sections: Header, Per-Email Details, Summary, and Output Document Lists.

Sections#

Generated: Timestamp when the report was generated (format: YYYY-MM-DD HH:MM:SS {timezone})
Mode: The run mode — DRY-RUN (all connections mocked), READ-ONLY (live reads, write operations MOCKED), or LIVE
Run window: Start time through end time with total duration in seconds
Emails: Total number of emails processed in this run

Per-Email Block (repeated for each email)#

Each email gets a detailed block with these subsections:

Email Identification:

Email ID: The unique message identifier from the email system (typically the database Id column from the Hodor dmorders_thompson table)
From: The sender's display name and email address in format Name <email@domain.com>
Subject: The full email subject line
Received: The date and time the email was received
Inbox: Which inbox/account the email was fetched from (e.g., energy@thompson.com, grants@thompson.com)
Body: A preview of the email body text, truncated to 500 characters with total character count shown if truncated. Newlines are collapsed to spaces for readability.

Contact Lookup:

Contact: Whether the contact was found and in how many systems, with the list of systems. Systems that were mocked in the current run are annotated with (mock). The count is broken down into live vs. mock counts.
Sources (all queried connectors): A comma-separated list of all backend connectors that were queried during contact lookup (e.g., cupola, hodor, multipub, salesforce), regardless of whether they returned results.
Sources used (by field): A semicolon-separated list showing which backend system provided each specific data field. Format: field_name:source_system (e.g., person_name:cupola; org_name:hodor; cupola_org_id:cupola).

LLM Classification (if classification was performed):

Initial Category: The category assigned by the first-pass classification LLM agent before QA review. Only shown if QA correction was applied.
Final Category: The category after QA agent review. Annotated with (QA corrected) if the QA agent changed the initial classification.
LLM Category: Shown when no QA correction was applied — the single category from classification.
QA Explanation: The QA agent's reasoning for why it changed (or confirmed) the classification.
Person Status: The employment/organizational status of the person as determined by the LLM (e.g., left_company, retired, deceased, active, on_leave).
Email Status: The status of the email address itself as determined by the LLM (e.g., valid, invalid, bounced, changed).

Determination:

Determination: The final determination type in uppercase. Possible values: INACTIVE (person has left/retired/deceased — mark inactive across systems), ACTIVE (person is still active — ensure records are current), REPLACEMENT (a replacement contact was identified — mark original inactive and add replacement), TITLE_UPDATE (person's title has changed), EMAIL_UPDATE (person's email address has changed), UNKNOWN (not relevant, spam, or unclassifiable — no action needed).
Confidence: The confidence score for the determination, displayed as a percentage (e.g., 95%).
Source Email: The email address extracted from the auto-response body that was used as the basis for contact lookup (may differ from the sender email when the bounced email references a different address).
New Email: A new email address for the person, extracted from the auto-response (relevant for EMAIL_UPDATE and some REPLACEMENT scenarios).
Replacement: The name and email of the replacement contact in format Name <email>. If multiple replacements were identified, each is numbered (Replacement 1, Replacement 2, etc.).
Repl. Title: The job title of the replacement contact, if provided.
Personal Email: A retired/personal email address for the person (e.g., when someone leaves a company and provides their personal email).
Long-term Leave: Displayed as Yes when the person is identified as being on extended leave rather than having permanently departed.
Reasoning: The LLM's reasoning/notes explaining why this determination was made.

Multipub Subscription Validation (if validation was performed):

Subscriber: The Multipub subscriber number and how it was matched (e.g., by email, by name). Shows Not found in Multipub if no subscriber record was located.
Active Subs: Whether active subscriptions were found, with the count of active orders.
Expired (12mo): Whether recently expired subscriptions (within 12 months) were found, with order count.
Single-Issue: Whether recent single-issue purchases were found, with order count.
Subscriptions: No relevant subscription activity — shown when none of the above subscription types were found.
Review Flag: The reason the record was flagged for manual review (e.g., active subscriptions found for an inactive person).
DEFERRED: Indicates that the inactive marking was HALTED because the person has active subscriptions in Multipub. These records require manual review before proceeding.

Standard Actions: A numbered list describing what actions WOULD be performed in a live run for this determination type, regardless of the current run mode. This serves as documentation of the expected workflow. Actions reference specific systems (Cupola, Hodor, Multipub, Salesforce) and note which are mocked in the current run.

Actions Executed: A list of every action that was actually executed (or mocked) during this run. Each action shows:

[ OK] or [FAIL] status indicator
The system name (e.g., cupola, hodor, salesforce)
The operation performed (e.g., mark_inactive, add_contact, update_email)
Detail text explaining the specific action taken

The section header varies by mode: (ALL MOCKED - dry-run), (writes MOCKED - read-only mode), or no annotation in live mode.

Outcome:

Outcome: The final processing status. Possible values: SUCCESS (all actions completed successfully), FAILED (one or more actions failed), SKIPPED NO CONTACT (contact not found in any system — no actions to take), SKIPPED UNKNOWN (determination was unknown — no actions needed), ERROR (an unexpected error occurred during processing), DEFERRED MULTIPUB (processing halted due to active Multipub subscriptions requiring manual review), PENDING (processing not yet complete — should not appear in final reports).
Reason: Explanation for why processing was skipped, if applicable.
Error: The error message if the status is ERROR or FAILED.
Duration: Processing time for this individual email in milliseconds.

Summary Section#

Aggregated counts across all emails in the batch:

Total Emails: Total number of emails processed
Successful: Count of emails with success status
Failed: Count with failed status
Skipped (no contact): Count with skipped_no_contact status
Skipped (unknown det): Count with skipped_unknown status
Errors: Count with error status
LLM Category Breakdown: Count of emails per LLM classification category
Determination Breakdown: Count of emails per determination type
QA Corrections: Number of emails where the QA agent changed the initial classification
Multipub Validation: Counts for validated, active subscriptions found, recently expired, recent single-issue, and deferred (halted)
Action Totals: Total actions executed, succeeded, and failed
Output Document Lists: Record counts for Inactive People, Alternate Contacts, and Inactive at New Org lists
Total Run Duration: Wall-clock time for the entire batch run in seconds

Output Document Lists (appended if data exists)#

Detailed listings for three output document types. See the individual output document file descriptions below for field details.

Output replay (`regenerated/`)#

The output replay utility (auto-responder-replay-output / scripts/replay_output.py) re-runs the shared batch pipeline (pipeline/batch_processor.py) for emails extracted from an existing processing_reports/run_* folder. Regenerated files are written only under run_*/regenerated/; originals in the run root are never overwritten.

After replay, regenerated/replay_verification.json summarizes per-file comparison (match, diff, error, or skipped) against the source artifact. Volatile LLM fields (confidence, QA explanation, timestamps) are ignored by default.

v1 scope: output_document_*, processing_report_*, category_summary_report, classifier output, cupola audit, impact report, and batch report (full-run). Notification CSVs (Hodor import, Tarun undetermined, Multipub follow-up) are deferred to v1.1.

4. `processing_report.json`#

Purpose#

A JSON companion to the human-readable processing report (.log). Contains the same data in a machine-parseable format suitable for programmatic consumption, integration with dashboards, or post-run analysis scripts.

Generation Conditions#

Generated whenever processing_report.log is generated (when at least one email is processed).

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

Field	Type	Description
generated_at	string (ISO 8601)	The timestamp when this JSON file was generated.
run_start	string (ISO 8601)	The timestamp when the pipeline run began.
total_emails	integer	The total number of emails processed in this run.
records	array of objects	An array containing one object per processed email. Each object is a full serialization of the `EmailProcessingRecord` dataclass (see Record Fields below).
output_documents	object	Present only when the output document collector has data. Contains three keys: `inactive_people`, `alternate_contacts`, and `inactive_new_org`, each with `purpose` (string), `record_count` (integer), and `records` (array of objects). Undeliverables are not embedded here; when present they are written only to `output_document_undeliverables.csv` and `output_document_undeliverables.json` (see sections 16–17).

Record Fields (each object in `records` array)#

Email Identification Fields#

Field	Type	Description
sender_email	string	The email address of the auto-response sender. This is the raw `SenderEmail` from the email record in the database.
sender_name	string or null	The display name of the sender, if available from the email headers. May be null if the email only contained an address without a display name.
subject	string	The full subject line of the auto-response email.
received_date	string	The date and time the email was received, as recorded in the source database. Format may vary based on the source system.
inbox_source	string	The inbox/account from which the email was fetched. Corresponds to the `AccountName` in the email database (e.g., `energy@thompson.com`, `grants@thompson.com`, `resources@associationexecs.com`). This determines which business line the email belongs to.
message_id	string	The unique identifier for the email record, typically the database primary key `Id` from the Hodor `dmorders_thompson` SQL table. Used as the primary key for tracking this email throughout the pipeline.
original_sender_email	string or null	The original sender email before any normalization or cleanup. Present when the pipeline modifies the sender email during processing (e.g., stripping display names, handling forwarded emails). Null if no modification was needed.
body	string	The full body text of the auto-response email. Contains the raw text content that was analyzed by the LLM classifier to determine the person's status, extract replacement contacts, new email addresses, etc.

Contact Lookup Fields#

Field	Type	Description
lookup_email	string or null	The email address that was actually used for contact lookup across backend systems. This may differ from `sender_email` when the auto-response body references a different email address (the `source_email`). If null, no lookup was performed.
contact_found	boolean	`true` if the contact was found in at least one backend system (Cupola, Hodor, Multipub, or Salesforce). `false` if the email address was not found in any system. This is the primary indicator of whether downstream actions can be taken.
contact_systems	array of strings	List of backend systems where the contact was found. Possible values in the array: `cupola` (contact management system), `hodor` (Thompson's dmorders database), `multipub` (subscription/publication management), `salesforce` (CRM). An empty array means the contact was not found anywhere.
mock_contact_systems	array of strings	Subset of `contact_systems` that were operating in mock/simulated mode during this run. In dry-run mode, all systems are mocked. In read-only mode, write operations are mocked but reads are live. In live mode, this array is empty. Useful for distinguishing real vs. simulated lookup results.
cupola_org_id	string or null	The CUPOLA `organization_id` for the preferred org-person link. This is the organization identifier in the Cupola contact management system. Null if the contact was not found in Cupola or Cupola was not queried.
cupola_org_person_id	string or null	The CUPOLA `org_person_id` — the unique identifier for the link between a person and an organization in Cupola. This is the record that gets marked active/inactive when processing status changes. Null if not found in Cupola.
cupola_person_id	string or null	The CUPOLA `person_id` — the unique identifier for the person entity in Cupola, independent of their organizational affiliation. A person may have multiple org_person links but only one person_id. Null if not found in Cupola.
hodor_pros_num	string or null	The HODOR `ProsNum` (prospect number) — the unique contact identifier in Thompson's Hodor/dmorders database system. This is used to update contact status in Hodor (e.g., marking as "No Longer with Firm"). Null if not found in Hodor.
org_name	string or null	The organization/company name associated with the contact. May be sourced from Cupola, Hodor, or other backend systems (see `org_name_source`). Null if no organization name was found.
person_name	string or null	The full name of the contact person. May be sourced from Cupola, Hodor, or other backend systems (see `person_name_source`). Null if no person name was found.
lookup_sources_available	string	Comma-separated list of all backend connector names that were available and queried during the contact lookup phase, regardless of whether they returned results. Represents the scope of the search. Example: `cupola, hodor, multipub, salesforce`.
person_name_source	string or null	The specific backend system that provided the `person_name` value (e.g., `cupola`, `hodor`). Null if no person name was found. Useful for provenance tracking when multiple systems have conflicting data.
org_name_source	string or null	The specific backend system that provided the `org_name` value. Null if no org name was found.
sources_used_fields	string	A semicolon-separated summary of which backend system provided each specific data field. Format: `field_name:source_system; field_name:source_system`. Example: `person_name:cupola; org_name:hodor; cupola_org_id:cupola; hodor_pros_num:hodor`. This provides full provenance for every piece of contact data.

Determination Fields#

Field	Type	Description
determination	string	The final determination type assigned to this email after LLM classification, QA review, and contact lookup. Possible values: `inactive` (person has left the company, retired, or is deceased — mark contact inactive across all systems), `active` (person is confirmed active — ensure records are current), `replacement` (a replacement contact was identified — mark original inactive and add the replacement), `title_update` (person's job title has changed — update title across systems), `email_update` (person's email address has changed — update email across systems), `unknown` (email is not relevant, is spam, or cannot be classified — no action taken). Empty string if determination has not been made.
confidence	float	A confidence score between 0.0 and 1.0 representing how confident the system is in the determination. Higher values indicate greater certainty. A confidence of 0.0 typically indicates no determination was made. Displayed as a percentage in human-readable reports (e.g., 0.95 → `95%`).
llm_category	string or null	The final LLM classification category after QA review. This is the category used to drive the determination logic. Possible values: `undeliverable` (bounce-back or invalid email address), `left company` (person departed the organization), `retired` (person retired), `deceased` (person is deceased), `out of office` (temporary absence — auto-reply), `changed email` (person's email address has changed). Null if classification was not performed.
initial_llm_category	string or null	The category assigned by the first-pass classification LLM agent, before QA review. When QA does not change the category, this matches `llm_category`. When QA corrects the classification, this preserves the original (incorrect) category for audit purposes. Null if classification was not performed.
person_status	string or null	The employment/organizational status of the person as extracted by the LLM from the email body. Examples: `left_company`, `retired`, `deceased`, `active`, `on_leave`. This is a more granular status than the `llm_category` and is used as input to the determination logic. Null if not extracted.
email_status	string or null	The status of the email address itself as determined by the LLM. Examples: `valid`, `invalid`, `bounced`, `changed`. Used to distinguish between "person is gone" vs. "email address is bad." Null if not extracted.
qa_correction_applied	boolean	`true` if the QA agent reviewed the initial classification and changed the category. `false` if the QA agent confirmed the original classification or if QA review was not performed. When true, `initial_llm_category` and `llm_category` will differ.
qa_explanation	string or null	The QA agent's textual explanation for why it changed or confirmed the initial classification. Provides transparency into the QA review decision. Null if QA review was not performed.
replacement_info	array of objects	List of replacement contacts identified from the auto-response email. Each object contains: `replacement_name` (string or null — the name of the replacement person), `replacement_email` (string or null — the email address of the replacement person), `replacement_title` (string or null — the job title of the replacement person). An empty array means no replacement was identified. Multiple entries indicate multiple replacements were mentioned.
sender_new_email	string or null	A new email address for the sender, extracted from the auto-response body. Relevant for `email_update` determinations where the person's email has changed. Also used in `replacement` scenarios where the departing person provides their new personal/forwarding email. Null if no new email was identified.
retired_personal_email	string or null	A personal/private email address provided by someone who has retired or left their organization. Distinct from `sender_new_email` in that this is typically a non-work email (e.g., Gmail, Yahoo) shared for personal contact purposes rather than as an official forwarding address. Null if none was provided.
is_long_term_leave	boolean	`true` if the LLM determined the person is on an extended/long-term leave of absence (e.g., maternity leave, sabbatical, medical leave) rather than having permanently departed the organization. This affects the determination — long-term leave contacts are flagged for review rather than immediately marked inactive. `false` for all other cases.
source_email	string or null	The email address extracted from the auto-response body that was used as the basis for contact lookup. This may differ from `sender_email` — for example, when a mail server's bounce message references the intended recipient's address, which is the address we actually need to look up. Null if no alternate source email was extracted.
notes	string or null	Free-form reasoning or notes from the LLM explaining the basis for its classification and any additional context it identified in the email body. Null if no reasoning was provided.
standard_actions_description	string or null	A human-readable, multi-line description of the standard actions that WOULD be performed for this determination type in a live run, based on the contact systems found and the determination type. This is generated from the determination reference documentation and serves as an expected-behavior checklist regardless of the current run mode. Null if no determination was made.

Multipub Validation Fields#

Field	Type	Description
multipub_validation_performed	boolean	`true` if Multipub subscription validation was executed for this email. Validation is performed for `INACTIVE` and `REPLACEMENT` determinations to check whether the person has active subscriptions before marking them inactive. `false` if validation was skipped (e.g., for `ACTIVE` or `UNKNOWN` determinations).
multipub_subsnum	string or null	The Multipub subscriber number (`SubsNum`) for this contact, if found. This is the unique identifier for a subscriber in the Multipub subscription management system. Null if the contact was not found in Multipub.
multipub_match_method	string or null	The method by which the contact was matched to a Multipub subscriber record. Possible values include matching by email address, by name, or by other criteria. Null if no match was found.
multipub_has_active_subscription	boolean	`true` if the contact has at least one currently active subscription in Multipub. When true for an `INACTIVE` determination, the inactive marking is HALTED (deferred) because the person still has live subscription activity that needs to be addressed by the sales team.
multipub_has_recently_expired	boolean	`true` if the contact has subscriptions that expired within the last 12 months. These are flagged for the sales team's awareness but do not halt inactive processing.
multipub_has_recent_single_issue	boolean	`true` if the contact has recent single-issue (one-time) purchases. These are flagged for the sales team's awareness but do not halt inactive processing.
multipub_active_order_count	integer	The number of currently active subscription orders. Zero if no active subscriptions exist.
multipub_expired_order_count	integer	The number of subscriptions that expired within the last 12 months.
multipub_single_issue_order_count	integer	The number of recent single-issue purchase orders.
multipub_flagged_for_review	boolean	`true` if this record was flagged for manual review due to subscription-related concerns (active subscriptions, recently expired, or single-issue orders found for an inactive person).
multipub_review_reason	string or null	The specific reason the record was flagged for review. Examples: `Active subscriptions found for inactive contact`, `Recently expired subscriptions require sales follow-up`. Null if not flagged.
multipub_deferred	boolean	`true` if the inactive marking was HALTED because active subscriptions were found in Multipub. This is the most critical flag — it means the system deliberately stopped processing this email to prevent marking someone inactive who still has live subscriptions. These records must be manually reviewed and resolved.

Raw LLM Output Fields#

Field	Type	Description
raw_classification_result	object or null	The complete, unmodified JSON output from the first-pass LLM classification agent. Contains the raw `category`, `confidence`, `sender_new_email`, `alternate_contact`, `retired_personal_email`, `is_long_term_leave`, `reasoning`, and any other fields the LLM produced. Null if classification was not performed. Preserved for audit and debugging purposes.
raw_qa_result	object or null	The complete, unmodified JSON output from the QA review LLM agent. Contains `final_category`, `final_sender_new_email`, `final_alternate_contact`, `final_retired_personal_email`, `is_long_term_leave`, `qa_correction_applied`, `qa_explanation`, and any other fields. Null if QA review was not performed. Preserved for audit and debugging.

Actions Fields#

Field	Type	Description
actions	array of objects	List of all actions executed (or mocked) for this email. Each action object contains: `system` (string — the backend system, e.g., `cupola`, `hodor`, `salesforce`, `multipub`), `operation` (string — the operation performed, e.g., `mark_inactive`, `add_contact`, `update_email`, `check_subscriptions`), `success` (boolean — whether the action completed successfully), `detail` (string — additional detail text about what was done, may be empty). An empty array means no actions were executed.

Outcome Fields#

Field	Type	Description
status	string	The final processing outcome status. Possible values: `success` (all actions completed), `failed` (one or more actions failed), `skipped_no_contact` (contact not found — no actions taken), `skipped_unknown` (determination was unknown — no actions needed), `error` (unexpected error occurred), `deferred_multipub` (halted due to active Multipub subscriptions), `pending` (should not appear in final output).
skip_reason	string or null	A human-readable explanation for why processing was skipped or deferred. Null when the email was fully processed. Examples: `Contact not found in any system`, `Determination is unknown — no actions required`, `Deferred: active Multipub subscriptions`.
error_message	string or null	The error message text when status is `error` or `failed`. Contains the exception message or a description of what went wrong. Null when no error occurred.
duration_ms	float	The wall-clock processing time for this individual email in milliseconds. Measures the time from when this email started processing to when it completed. Useful for identifying slow emails that may be caused by slow LLM responses, slow database lookups, or complex action execution.

5. `processing_report_master.csv` and `processing_report_ip4.csv`#

Each run emits two CSV companions from ReportGenerator.write_report: processing_report_master.csv (full column ledger) and processing_report_ip4.csv (IP4-facing subset, fixed column order — 2026-05-03).

`processing_report_master.csv`#

Purpose#

Spreadsheet-compatible export with one row per processed email and the complete flattened column set (ReportGenerator._CSV_COLUMNS). Use this file for internal analysis, Client Services run reports, and audit.

Generation Conditions#

Generated whenever processing_report.log is generated.

Format#

CSV with UTF-8 BOM encoding (utf-8-sig for Excel compatibility). All fields are quoted (QUOTE_ALL). Newlines within field values are replaced with spaces to prevent row splitting.

Column Reference#

#	Column Header	Source Field	Description
1	Email ID	`message_id`	Unique identifier for the email record (database primary key from Hodor).
2	Sender Email	`sender_email`	The email address of the auto-response sender.
3	Original Sender Email	`original_sender_email`	The sender email before normalization, if it was modified. Empty if unchanged.
4	Lookup Email	`lookup_email`	The email address actually used for contact lookup across backend systems. May differ from sender email.
5	Sender Name	`sender_name`	Display name of the sender from email headers. Empty if not available.
6	Subject	`subject`	Full subject line of the auto-response email. Newlines replaced with spaces.
7	Received Date	`received_date`	Date and time the email was received.
8	Inbox Source	`inbox_source`	The inbox/account (AccountName) the email was fetched from. Determines business line.
9	Body	`body`	Full body text of the email. Newlines replaced with spaces.
10	Contact Found	`contact_found`	`Yes` if contact was found in at least one backend system, `No` otherwise.
11	Sources	`lookup_sources_available`	Comma-separated list of all backend connectors queried during contact lookup.
12	Sources Used	`sources_used_fields`	Semicolon-separated provenance map showing which system provided each data field (e.g., `person_name:cupola; org_name:hodor`).
13	Contact Systems (Live)	computed	Comma-separated list of systems where the contact was found using LIVE (non-mocked) connections. Empty if all lookups were mocked or contact not found.
14	Contact Systems (Mock)	computed	Comma-separated list of systems where the contact was found using MOCKED connections. Empty in live mode.
15	HODOR ProsNum	`hodor_pros_num`	The Hodor prospect number for this contact. Empty if not found in Hodor.
16	CUPOLA Org ID	`cupola_org_id`	The Cupola organization ID. Empty if not found in Cupola.
17	CUPOLA Org Person ID	`cupola_org_person_id`	The Cupola org-person link ID. Empty if not found.
18	CUPOLA Person ID	`cupola_person_id`	The Cupola person entity ID. Empty if not found.
19	Multipub Subsnum	`multipub_subsnum`	The Multipub subscriber number. Empty if not found in Multipub.
20	Initial LLM Category	`initial_llm_category`	Category from the first-pass LLM classification, before QA review. Empty if not classified.
21	Final LLM Category	`llm_category`	Final category after QA review. Empty if not classified.
22	QA Correction Applied	`qa_correction_applied`	`Yes` if QA agent changed the initial classification, `No` otherwise.
23	QA Explanation	`qa_explanation`	QA agent's reasoning for its decision. Empty if QA was not performed.
24	Person Status	`person_status`	Person's employment/org status from LLM (e.g., `left_company`, `retired`). Empty if not extracted.
25	Email Status	`email_status`	Status of the email address from LLM (e.g., `valid`, `bounced`). Empty if not extracted.
26	Determination	`determination`	Final determination type: `inactive`, `active`, `replacement`, `title_update`, `email_update`, `unknown`. Empty if not determined.
27	Confidence	`confidence`	Confidence score formatted as percentage (e.g., `95%`). `0%` if not determined.
28	Source Email	`source_email`	Email address extracted from auto-response body used for lookup. Empty if same as sender.
29	New Email	`sender_new_email`	New email address identified for the person. Empty if none found.
30	Replacement Name	computed	Semicolon-separated list of replacement contact names (from `replacement_info`). Empty if no replacements.
31	Replacement Email	computed	Semicolon-separated list of replacement contact email addresses. Empty if no replacements.
32	Replacement Title	computed	Semicolon-separated list of replacement contact job titles. Empty if no replacements.
33	Retired Personal Email	`retired_personal_email`	Personal email provided by departed/retired person. Empty if none provided.
34	Long-term Leave	`is_long_term_leave`	`Yes` if person is on long-term leave, `No` otherwise.
35	Reasoning	`notes`	LLM reasoning/notes for the determination. Empty if none provided.
36	Multipub Validated	`multipub_validation_performed`	`Yes` if Multipub validation was performed, `No` otherwise.
37	Multipub Subscriber	`multipub_subsnum`	Multipub subscriber number (same as column 19). Empty if not found.
38	Multipub Match Method	`multipub_match_method`	How the contact was matched in Multipub (e.g., by email, by name). Empty if not matched.
39	Multipub Active Subs	`multipub_has_active_subscription`	`Yes` if active subscriptions exist, `No` otherwise.
40	Multipub Active Order Count	`multipub_active_order_count`	Number of active subscription orders. `0` if none.
41	Multipub Recently Expired	`multipub_has_recently_expired`	`Yes` if subscriptions expired within 12 months, `No` otherwise.
42	Multipub Expired Order Count	`multipub_expired_order_count`	Number of recently expired orders. `0` if none.
43	Multipub Single-Issue	`multipub_has_recent_single_issue`	`Yes` if recent single-issue purchases exist, `No` otherwise.
44	Multipub Single-Issue Order Count	`multipub_single_issue_order_count`	Number of single-issue orders. `0` if none.
45	Multipub Flagged for Review	`multipub_flagged_for_review`	`Yes` if record was flagged for manual review, `No` otherwise.
46	Multipub Review Reason	`multipub_review_reason`	Reason the record was flagged. Empty if not flagged.
47	Multipub Deferred	`multipub_deferred`	`Yes` if inactive marking was halted due to active subscriptions, `No` otherwise.
48	CUPOLA Actions Summary	computed	Semicolon-separated summary of all Cupola-specific actions. Format: `[OK/FAIL] system: operation - detail`. Empty if no Cupola actions.
49	Actions Summary	computed	Semicolon-separated summary of all non-Cupola actions (Hodor, Salesforce, etc.). Format: `[OK/FAIL] system: operation - detail`. Empty if no non-Cupola actions.
50	Status	`status`	Final processing status: `success`, `failed`, `skipped_no_contact`, `skipped_unknown`, `error`, `deferred_multipub`, `pending`.
51	Skip Reason	`skip_reason`	Reason for skipping. Empty if not skipped.
52	Error Message	`error_message`	Error text if status is error/failed. Empty if no error.
53	Duration (ms)	`duration_ms`	Processing duration in milliseconds, formatted as an integer.

`processing_report_ip4.csv`#

Purpose#

Filtered export for Sai Teja / IP4: only rows that need manual Cupola follow-up under the agreed LLM categories, with a fixed 23-column layout so templates and macros do not drift (ReportGenerator._CSV_COLUMNS_IP4).

Generation Conditions#

Written together with the master CSV whenever processing_report.log is generated.

Row filter#

Only emails whose Final LLM Category (or, if empty, Initial LLM Category) normalizes to one of: Out of Office, Retired, Deceased, Left Company, Changed Email (ReportGenerator._IP4_ACTIONABLE_CATEGORIES). All other categories are excluded from this file (they still appear on the master CSV and in processing_report.json).

Format#

Same as the master CSV: UTF-8 BOM, QUOTE_ALL, newline sanitation.

Column Reference (fixed order — 23 columns)#

#	Column Header	Source Field / derivation	Description
1	Email ID	`message_id`	Same as master §5 column 1.
2	Inbox Source	`inbox_source`	Same as master §5 column 8.
3	Original Sender Email	`original_sender_email`	Same as master §5 column 3.
4	Sender Email	`sender_email`	Same as master §5 column 2.
5	Lookup Email	`lookup_email`	Same as master §5 column 4.
6	Source Email	`source_email`	Same as master §5 column 28.
7	Sender Name	`sender_name`	Same as master §5 column 5.
8	Subject	`subject`	Same as master §5 column 6.
9	Body	`body`	Same as master §5 column 9.
10	Initial LLM Category	`initial_llm_category`	Same as master §5 column 20.
11	Final LLM Category	`llm_category`	Same as master §5 column 21.
12	Determination	`determination`	Same as master §5 column 26.
13	Person Status	`person_status`	Same as master §5 column 24.
14	Email Status	`email_status`	Same as master §5 column 25.
15	CUPOLA Org ID	`cupola_org_id`	Same as master §5 column 16.
16	CUPOLA Org Person ID	`cupola_org_person_id`	Same as master §5 column 17.
17	CUPOLA Person ID	`cupola_person_id`	Same as master §5 column 18.
18	New Email	`sender_new_email`	Same as master §5 column 29.
19	Replacement Name	computed from `replacement_info`	Same as master §5 column 30.
20	Replacement Email	computed from `replacement_info`	Same as master §5 column 31.
21	Replacement Title	computed from `replacement_info`	Same as master §5 column 32.
22	Retired Personal Email	`retired_personal_email`	Same as master §5 column 33.
23	CUPOLA Actions Summary	computed from `actions` (Cupola only)	Same as master §5 column 48.

6. `category_summary_report.csv`#

Purpose#

A consolidated summary that groups all processed emails into five main business categories. This report collapses the granular LLM categories into broader groups for high-level analysis and reporting to stakeholders who need to understand the distribution of auto-response types without granular detail.

Generation Conditions#

Generated when at least one email is processed. Not generated if the records list is empty.

Format#

CSV with UTF-8 BOM encoding. All fields are quoted. Rows are ordered by category in a fixed sequence: Undeliverable, Left Company / Retired / Deceased, Out of Office, Changed Email, Other.

Category Mapping#

Main Category	Mapped From LLM Categories
Undeliverable	`undeliverable`
Left Company / Retired / Deceased	`left company`, `retired`, `deceased`
Out of Office	`out of office`
Changed Email	`changed email`
Other	Any category not matching the above, or null/empty categories

Column Reference#

#	Column Header	Description
1	Category	The main business category this email was mapped to (one of the five categories above).
2	Email ID	Unique identifier for the email record (same as `message_id`).
3	Sender Email	The sender's email address.
4	Lookup Email	The email address used for contact lookup. Empty if same as sender or not available.
5	Contact Found	`Yes` if contact was found in any backend system, `No` otherwise.
6	Contact Systems	Comma-separated list of systems where the contact was found.
7	Determination	The final determination type (`inactive`, `active`, `replacement`, etc.). Empty if not determined.
8	Status	Processing outcome status (`success`, `failed`, `skipped_no_contact`, etc.).
9	Org Name	Organization name associated with the contact. Empty if not found.
10	Person Name	Person's name. Falls back to sender name if person name is not available.
11	CUPOLA Org ID	Cupola organization ID. Empty if not in Cupola.
12	CUPOLA Org Person ID	Cupola org-person link ID. Empty if not in Cupola.
13	CUPOLA Person ID	Cupola person ID. Empty if not in Cupola.
14	HODOR ProsNum	Hodor prospect number. Empty if not in Hodor.
15	Multipub SubsNum	Multipub subscriber number. Empty if not in Multipub.

7. `category_summary_report.json`#

Purpose#

JSON companion to the category summary CSV. Provides the same grouped data in a machine-readable format with records organized under their respective category keys.

Generation Conditions#

Generated alongside category_summary_report.csv.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

Field	Type	Description
generated_at	string (ISO 8601)	Timestamp when this file was generated.
run_start	string (ISO 8601)	Timestamp when the pipeline run began.
total_emails	integer	Total number of emails in this report.
categories	object	An object where each key is a main category name and the value is an object containing `record_count` (integer) and `records` (array of objects). Each record object has the same fields as the CSV columns listed above, using snake_case keys: `category`, `email_id`, `sender_email`, `lookup_email`, `contact_found`, `contact_systems`, `determination`, `status`, `org_name`, `person_name`, `cupola_org_id`, `cupola_org_person_id`, `cupola_person_id`, `hodor_pros_num`, `multipub_subsnum`.

8. `classifier_output/classifier_output.json`#

Purpose#

The raw, unprocessed output from the LLM classification and QA agents for every email that went through classification. This file preserves the full agent responses before any post-processing, mapping, or interpretation by the pipeline. It serves as the primary audit trail for LLM decision-making and is essential for debugging classification issues, evaluating LLM accuracy, and tuning prompts.

Generation Conditions#

Generated only when at least one email went through LLM classification (i.e., at least one record has raw_classification_result or raw_qa_result populated). Created in a classifier_output/ subdirectory within the run folder.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

Field	Type	Description
generated_at	string (ISO 8601)	Timestamp when this file was generated.
run_start	string (ISO 8601)	Timestamp when the pipeline run began.
total_classified_emails	integer	Number of emails that were classified by the LLM in this run.
records	array of objects	One object per classified email (see below).

Record Fields#

Field	Type	Description
email_id	string	Unique identifier for the email.
sender_email	string	Sender's email address.
sender_name	string or null	Sender's display name.
subject	string	Email subject line.
inbox_source	string	Inbox/account the email came from.
classification_agent_output	object or null	The complete raw JSON response from the first-pass classification LLM agent. Structure depends on the LLM prompt and may include: `category`, `confidence`, `sender_new_email`, `alternate_contact`, `retired_personal_email`, `is_long_term_leave`, `reasoning`, `person_status`, `email_status`, and any additional fields the LLM returns. Null if classification was not performed.
qa_agent_output	object or null	The complete raw JSON response from the QA review LLM agent. Structure depends on the QA prompt and may include: `final_category`, `final_sender_new_email`, `final_alternate_contact`, `final_retired_personal_email`, `is_long_term_leave`, `qa_correction_applied`, `qa_explanation`, and any additional fields. Null if QA review was not performed.

9. `classifier_output/classifier_output.csv`#

Purpose#

A tabular/spreadsheet-friendly view of the LLM classification and QA outputs. Flattens the raw agent responses into discrete columns for side-by-side comparison of initial classification vs. QA review results, and includes the Determination the pipeline derived from the QA-final category (see column 6 below).

Note on categories: The classification and QA agents are expected to assign a single label from the nine LLM categories. If the model returns a compound string (for example comma-separated labels), the pipeline normalizes it to one canonical category using a fixed severity priority before mapping to actions, and the CSV reflects that normalized value in Initial Category and Final Category.

Generation Conditions#

Generated alongside classifier_output.json.

Format#

CSV with UTF-8 BOM encoding. All fields quoted.

Column Reference#

#	Column Header	Description
1	Email ID	Unique identifier for the email.
2	Sender Email	Sender's email address.
3	Sender Name	Sender's display name. Empty if not available.
4	Subject	Email subject line (newlines replaced with spaces).
5	Inbox Source	Inbox/account the email was fetched from.
6	Determination	The pipeline's mapped action type for this email: one of `unknown`, `inactive`, `active`, `replacement`, `email_update`, `title_update` (same meaning as elsewhere in processing reports). Derived from the QA-final LLM category and business rules in `category_mapper.map_category_to_determination`, not a separate LLM field. Empty if not populated on the processing record.
7	Initial Category	Category assigned by the first-pass classification agent (from `raw_classification_result.category`), after normalization to a single canonical label when needed.
8	Confidence	Confidence level from the classification agent (from `raw_classification_result.confidence`).
9	Sender New Email (Classification)	New email address extracted by the classification agent. Empty if none found.
10	Alternate Contact (Classification)	Alternate/replacement contact info extracted by classification agent. May be a structured string. Empty if none found.
11	Retired Personal Email (Classification)	Personal email extracted by classification agent. Empty if none found.
12	Is Long Term Leave (Classification)	`Yes` if classification agent identified long-term leave, `No` otherwise.
13	Reasoning	The classification agent's reasoning text explaining its categorization.
14	Final Category	Category after QA review (from `raw_qa_result.final_category`), after normalization to a single canonical label when needed.
15	Final Sender New Email	New email after QA review correction. Empty if not changed or not applicable.
16	Final Alternate Contact	Alternate contact after QA correction. Empty if not changed.
17	Final Retired Personal Email	Personal email after QA correction. Empty if not changed.
18	Is Long Term Leave (QA)	`Yes` if QA agent confirmed long-term leave, `No` otherwise.
19	QA Correction Applied	`Yes` if QA changed the classification, `No` if it confirmed the original.
20	QA Explanation	QA agent's explanation of its review decision.

10. `output_document_inactive_people.csv`#

Purpose#

A business deliverable listing all people determined to be INACTIVE (left company, retired, deceased) along with the specific actions taken or planned across each backend system. This document is used by operations teams to verify that inactive contacts have been properly removed or suppressed across CUPOLA, HODOR, SFMC, and Multipub. It also provides the sales team with active subscription information so they can follow up on transferring subscriptions. Not emailed on N04. The marketing team receives the slimmer SFMC import file(s) described in Marketing suppression deliverable (*_NoLongerThere_*.csv) via notify_marketing_suppression.

Generation Conditions#

Generated only when at least one inactive person record exists in the output document collector.

Format#

CSV with UTF-8 BOM encoding. All fields quoted.

Column Reference#

#	Column Header	Description
1	Id	Record identifier. Typically the email's database primary key or a generated unique ID for tracking this inactive person record through downstream workflows.
2	AccountName	The inbox/account source (business line email) the auto-response was received at. Examples: `energy@thompson.com`, `grants@thompson.com`, `resources@associationexecs.com`. Determines which business line is affected and which SFMC suppression list to use.
3	Org Name	The organization/company name the person was associated with. Sourced from Cupola or Hodor contact records. Empty if not available.
4	Person Name	The full name of the inactive person. Sourced from Cupola or Hodor contact records. Empty if not available.
5	Email	The email address of the inactive person (the "Auto Response Received From" address). This is the email that triggered the auto-response and is the address being marked inactive across systems.
6	Status with Org	The person's status relative to their organization as determined by the LLM (e.g., `left_company`, `retired`, `deceased`). Provides context for why the person is being marked inactive. Empty if not determined.
7	CUPOLA Org ID	The Cupola organization ID for the preferred org-person link. Used by operations to verify the correct organization record in Cupola. Empty if not in Cupola.
8	CUPOLA Person ID	The Cupola person entity ID. Used by operations to locate the person record in Cupola. Empty if not in Cupola.
9	CUPOLA Org Person IDs	Comma-separated list of all Cupola `org_person_id` values that were marked inactive for this email address. A person may have multiple org-person links (e.g., they are a contact at multiple organizations). All linked records are marked inactive. Empty if not in Cupola.
10	HODOR ProsNums	Comma-separated list of all Hodor prospect numbers (`ProsNum`) that were marked as "No Longer with Firm" for this email address. A person may have multiple prospect records in Hodor. Empty if not in Hodor.
11	Multipub Subsnum	The Multipub subscriber number, if the contact was found in the Multipub subscription system. Empty if not found.
12	Salesforce IDs	Comma-separated list of Salesforce Lead or Contact record IDs associated with this email, if the contact was found in Salesforce. Empty if not in Salesforce.
13	HODOR Status	The HODOR status action that was taken. Typically `No Longer with Firm` for inactive contacts. Empty if no Hodor action was taken.
14	SFMC Suppression Added	`Yes` if the email address was added to the SFMC (Salesforce Marketing Cloud) Auto Suppression List for the corresponding business line. `No` if the suppression was not added (e.g., if SFMC operations were mocked or failed).
15	Multipub Active Subscriptions	A summary of active subscriptions found in Multipub for this person. Contains serialized order details (up to 3 entries) for the sales team to follow up on. These are subscriptions that need to be transferred or cancelled since the person is no longer active. Empty if no active subscriptions.
16	Multipub Recent Orders	A summary of recently expired or single-issue orders from Multipub (within the past 12 months). Contains serialized order details (up to 3 entries) for sales team awareness. Empty if no recent orders.

11. `output_document_inactive_people.json`#

Purpose#

JSON companion to the inactive people CSV. Contains the same data in structured format for programmatic consumption.

Generation Conditions#

Generated alongside the CSV when inactive person records exist.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

Field	Type	Description
list_name	string	Always `"List of Inactive People"`.
purpose	string	Always `"Remove / follow up these emails from across our systems (CUPOLA, HODOR, SFMC, MultiPub)"`.
generated_at	string (ISO 8601)	Timestamp when this file was generated.
record_count	integer	Number of inactive person records in this file.
records	array of objects	Each object is a full serialization of the `InactivePersonRecord` Pydantic model. All fields from the CSV are present using snake_case naming. Nested lists and objects (such as `multipub_active_subscriptions` and `multipub_recent_orders`) are fully expanded as arrays of objects rather than serialized strings.

Marketing suppression deliverable (`{BusinessUnit}_NoLongerThere_{YYYY-MM-DD}.csv`)#

Purpose#

SFMC-ready suppression import file(s) for the marketing team (notification catalog N04) — this is the production suppression path. Derived from the same InactivePersonRecord rows as output_document_inactive_people.csv, but only the email list and three import columns — no Cupola/Hodor/Multipub detail. Live SFMC REST upsert during processing is not in production; see MARKETING_SUPPRESSION.html.

Generation Conditions#

Written in the same pass as output_document_inactive_people.csv when at least one inactive person record exists. Implemented in marketing_suppression_deliverable.write_marketing_suppression_deliverables. One file per business-unit label (sorted by label); emails are deduped case-insensitively within each file.

Format#

CSV with UTF-8 BOM encoding. All fields quoted. Filename pattern: {BusinessUnit}_NoLongerThere_{YYYY-MM-DD}.csv where YYYY-MM-DD is parsed from run_{date}_{time} on the run folder, or today if the pattern does not match. Today resolve_business_unit_label maps every inbox to Marketing.

Column Reference#

#	Column Header	Description
1	Email Address	Inactive person email to suppress in SFMC.
2	Status	Always `Unsubscribed` (fixed import value).
3	Date Added	ISO date (`YYYY-MM-DD`) matching the run-date token in the filename.

Notification#

Notifier.notify_marketing_suppression discovers files with glob *_NoLongerThere_*.csv and attaches them only. output_document_inactive_people.csv is not attached. See NOTIFICATIONS_CATALOG — N04 and the detailed guide MARKETING_SUPPRESSION.html.

12. `output_document_alternate_contacts.csv`#

Purpose#

A business deliverable consolidating all replacement/alternate contacts identified from auto-response emails. When an inactive person's auto-response mentions a replacement (e.g., "Please contact Jane Doe at jane@company.com instead"), the replacement's information is captured here with planned actions for adding or updating them across CUPOLA, HODOR, and Multipub.

Generation Conditions#

Generated only when at least one alternate contact record exists in the output document collector.

Format#

CSV with UTF-8 encoding, minimal quoting.

Column Reference#

#	Column Header	Description
1	Id	Record identifier for tracking this alternate contact through downstream workflows.
2	AccountName	The inbox/account source (business line email) the original auto-response was received at. Determines which HODOR library the alternate contact will be imported into.
3	Email Received From	The email address of the original (now inactive) person whose auto-response mentioned this alternate contact. This links the alternate contact back to the inactive person they are replacing.
4	Subject	Source auto-response subject (traceability).
5	Email Body	Full raw body of the source auto-response email (plain text or HTML as stored).
6	Message ID	Source message identifier for traceability.
7	Org ID	The Cupola organization ID (`organization_id`) for the organization the alternate contact is being added to. This is typically the same organization as the original inactive person. Empty if not in Cupola.
8	Org Name	The organization/company name for the original sender, resolved once via `Contact.resolve_organization_for_deliverable` (CUPOLA preferred row, then Hodor `firm`, then first org hint). Always matches `Firm` inside HODOR Import Data. `Comments` may include `Org source:` (`cupola_preferred`, `hodor_firm`, `hint`, `replacement_cupola`). Empty if not available.
9	Alternate Person Name	The full name of the replacement/alternate contact person, as extracted by the LLM from the auto-response body. For HODOR import, this is split into `Fname` (first name) and `Lname` (last name). Empty if not provided.
10	Alternate Person Title	The job title of the alternate contact (e.g., "Director of Marketing", "VP Sales"). Maps to the HODOR `Titl` field. Empty if not provided.
11	Alternate Person Email	The email address of the alternate contact. Maps to the HODOR `Email` field. This is the primary identifier used to check if the person already exists in CUPOLA. Empty if not provided.
12	Alternate Person Phone	The phone number of the alternate contact. Maps to the HODOR `Phone` field. Empty if not provided.
13	Alternate Person Ext	The phone extension of the alternate contact. Maps to the HODOR `pext` field. Empty if not provided.
14	Org Person ID	The Cupola `org_person_id` for the alternate contact, if they already exist in Cupola. Used when the action is `update` rather than `add`. Empty if the contact is new to Cupola.
15	Person ID	The Cupola `person_id` for the alternate contact, if they already exist in Cupola. Empty if the contact is new.
16	HODOR ProsNum	The Hodor prospect number for the alternate contact, if they already exist in Hodor. Empty if new to Hodor.
17	Comments	Free-form comments or context about this alternate contact, typically derived from the auto-response text. May include the original person's name, the nature of the handoff, or other contextual information. Empty if none.
18	CUPOLA Action	The planned action for this alternate contact in Cupola. Values: `add` (pipeline will call `add_contact` because `check_contact_exists` found no row for this email), `update` (at least one Cupola row exists for this email — typically same mailbox/org-person handling). Empty if no Cupola action is planned. Note: The underlying `add_contact` implementation still enforces email/org rules: if a row already exists for the target org it returns that `org_person_id`; if the email exists only under other orgs it reuses `person_id` and inserts a new org-person link. See `docs/connections/cupola.html`.
19	HODOR Library	The HODOR library code that this alternate contact will be imported into. Determined by the `AccountName` (inbox source). Mapping: `energy@thompson.com` → `ENGY`, `grants@thompson.com` → `GRDM`, `resources@associationexecs.com` → `ASSN`, `resources@associationtrends.com` → `ASSN`, `resources@thealmanacofamericanpolitics.com` → `GR`. Empty if library cannot be determined.
20	HODOR Import Data	JSON-serialized object containing the fields needed for the HODOR import template: `Fname` (first name), `Lname` (last name), `Titl` (title), `Firm` (organization name), `Email` (email address), `Phone` (phone number), `pext` (phone extension). Empty if no HODOR import is planned.
21	Multipub Sales Request	`Yes` if this alternate contact was provided to the sales team for Multipub follow-up (typically when the original inactive person had active subscriptions that need to be transferred). `No` otherwise.

13. `output_document_alternate_contacts.json`#

Purpose#

JSON companion to the alternate contacts CSV. Contains the same data in structured format.

Generation Conditions#

Generated alongside the CSV when alternate contact records exist.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

Field	Type	Description
list_name	string	Always `"List of Alternate Contacts"`.
purpose	string	Always `"Consolidate list of all provided Alternate Contacts and add / update across all systems"`.
generated_at	string (ISO 8601)	Timestamp when this file was generated.
record_count	integer	Number of alternate contact records.
records	array of objects	Full serialization of `AlternateContactRecord` Pydantic models. All fields match the CSV columns using snake_case naming. The `hodor_import_data` field is a proper JSON object (not a serialized string).

14. `output_document_inactive_new_org.csv`#

Purpose#

A business deliverable tracking inactive people who have moved to a new organization. When an auto-response indicates someone has left for a different company (e.g., "I have moved to XYZ Corp"), this document captures the new organization details and records the planned actions for potentially adding or updating them in CUPOLA and HODOR at their new organization.

Generation Conditions#

Generated only when at least one inactive-at-new-org record exists in the output document collector.

Format#

CSV with UTF-8 encoding, minimal quoting.

Column Reference#

#	Column Header	Description
1	Id	Record identifier for tracking this record through downstream workflows.
2	Account Name	The inbox/account source (business line email) the auto-response was received at.
3	Email Received From	The email address of the person who sent the auto-response (the person who moved to a new organization).
4	Person Name	The name of the person who has moved to a new organization. Empty if not available.
5	New Org ID	The identifier for the new organization (e.g., a Cupola organization ID if the new org already exists in Cupola, or a newly assigned ID). Empty if the new organization has not been identified in any system.
6	New Org Name	The name of the new organization the person has moved to, as extracted from the auto-response body by the LLM. Empty if not provided.
7	New Org Title	The person's job title at their new organization. Empty if not provided.
8	New Org Email	The person's email address at their new organization (e.g., `person@newcompany.com`). Empty if not provided.
9	New Org Phone	The person's phone number at their new organization. Empty if not provided.
10	Org Person ID	The Cupola `org_person_id` for this person, if they already exist in Cupola. Used for updating existing records. Empty if not in Cupola.
11	Person ID	The Cupola `person_id` for this person, if they already exist in Cupola. Empty if not in Cupola.
12	HODOR ProsNum	The Hodor prospect number for this person, if they exist in Hodor. Empty if not in Hodor.
13	Comments	Free-form comments or context about the person's move, derived from the auto-response text. May include original organization name, reason for move, or other details. Empty if none.
14	CUPOLA Action	The planned Cupola action for this record. Values: `add` (person/org will be added to Cupola), `update` (existing record will be updated with new org info), `skip` (record will not be modified in Cupola), `ignore` (organization is not AI-appropriate and will not be added). Empty if no Cupola action planned.
15	CUPOLA Org Exists	`Yes` if the new organization already exists in Cupola. `No` if the organization is not yet in Cupola. This determines whether the person can be directly added to the existing org or if the org needs to be created first.
16	CUPOLA AI Appropriate	`Yes` if the new organization has been determined to be "AI appropriate" — meaning it is in an industry or category that warrants inclusion in Thompson's contact management systems. `No` if the organization is outside the target market and should be ignored. This check is performed when the organization does not already exist in Cupola.
17	HODOR Library Assignment	The HODOR library that this person should be assigned to at their new organization. Since the person has changed organizations, they may no longer be in the same industry as before, so library assignment may differ from the original. Currently marked as `TBD` in many cases pending manual review. Empty if not determined.
18	Multipub Sales Request	`Yes` if the person's new contact information was provided to the sales team for Multipub follow-up (e.g., to transfer subscriptions to their new organization). `No` otherwise.

15. `output_document_inactive_new_org.json`#

Purpose#

JSON companion to the inactive-at-new-org CSV. Contains the same data in structured format.

Generation Conditions#

Generated alongside the CSV when inactive-at-new-org records exist.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

Field	Type	Description
list_name	string	Always `"List of Inactive People at New Organization"`.
purpose	string	Always `"Track where inactive people went and determine if they should be included in our systems"`.
generated_at	string (ISO 8601)	Timestamp when this file was generated.
record_count	integer	Number of records.
records	array of objects	Full serialization of `InactiveNewOrgRecord` Pydantic models. All fields match the CSV columns using snake_case naming.

16. `output_document_undeliverables.csv`#

Purpose#

A business deliverable listing all emails classified as undeliverable — bounce-backs, invalid email addresses, and mail delivery failures. These records represent email addresses that are no longer valid and need to be removed or suppressed across backend systems to maintain data hygiene.

Generation Conditions#

Generated only when at least one undeliverable record exists in the output document collector.

Format#

CSV with UTF-8 BOM encoding. All fields quoted.

Column Reference#

#	Column Header	Description
1	Id	Record identifier for tracking this undeliverable record.
2	AccountName	The inbox/account source (business line email) the bounce-back was received at.
3	Sender Email	The sender email address from the bounce notification. This is typically the mail server or postmaster address, not the intended recipient.
4	Lookup Email	The email address that was looked up in backend systems. This is the address that actually bounced — the intended recipient whose email is no longer valid.
5	Org Name	The organization name associated with the undeliverable email, if the contact was found in any backend system. Empty if not found.
6	Person Name	The name of the person associated with the undeliverable email, if found. Empty if not found.
7	Subject	The subject line of the bounce-back email. Often contains the original subject or a delivery failure message.
8	CUPOLA Org ID	Cupola organization ID for the undeliverable contact, if found. Empty if not in Cupola.
9	CUPOLA Person ID	Cupola person ID for the undeliverable contact, if found. Empty if not in Cupola.
10	CUPOLA Org Person IDs	Comma-separated list of Cupola org-person IDs associated with this undeliverable email. Empty if not in Cupola.
11	HODOR ProsNums	Comma-separated list of Hodor prospect numbers for this contact. Empty if not in Hodor.
12	Multipub Subsnum	Multipub subscriber number, if found. Empty if not in Multipub.
13	Multipub Sales Request	`Yes` when catalog N02 sales follow-up was queued because Multipub validation showed active, recently expired, or recent single-issue activity. `No` otherwise. Backend writes remain blocked for bounce-pending undeliverables.
14	Multipub Active Subscriptions	Serialized active Multipub orders when validation ran (same shape as inactive-people deliverable). Empty if none.
15	Multipub Recent Orders	Recently expired or single-issue orders within the validation window. Empty if none.
16	Status	The processing status for this undeliverable record (e.g. `bounce_pending_rule`, `skipped_no_contact`).
17	Skip Reason	Reason if the undeliverable was not fully processed. Empty if processed normally.

17. `output_document_undeliverables.json`#

Purpose#

JSON companion to the undeliverables CSV. Contains the same data in structured format.

Generation Conditions#

Generated alongside the CSV when undeliverable records exist.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

Field	Type	Description
list_name	string	Always `"List of Undeliverables"`.
purpose	string	Always `"Bounce-backs and invalid email addresses for follow-up and removal from systems"`.
generated_at	string (ISO 8601)	Timestamp when this file was generated.
record_count	integer	Number of undeliverable records.
records	array of objects	Full serialization of `UndeliverableRecord` Pydantic models with snake_case field names.

18. `output_document_inactive_no_cupola_match.csv`#

Purpose#

Handoff list for every CUPOLA-undetermined case — inactive (or inactive-stage) determinations with no Cupola match, ACTIVE determinations with no Cupola row (no-auto-add policy), and ACTIVE determinations whose matched Cupola row is inactive (reactivation candidates). Used by IP4 / operations for manual Cupola research or record creation. Delivered via notify_sai_action_items (catalog N05/N06) — To: NOTIFICATION_EMAIL_SAI; global Max + Vish Cc.

Generation Conditions#

Generated when at least one InactiveNoCupolaMatchRecord was collected during the run (OutputDocumentCollector.inactive_no_cupola_match).

Format#

CSV with UTF-8 encoding. Column headers follow the same human-readable style as other output_document_* CSVs.

Column Reference#

Headers match output_document_generator.py (generate_inactive_no_cupola_match_csv).

#	Column Header	Description
1	Id	Record identifier
2	AccountName	Inbox / account source
3	Email Received From	Sender of the auto-response
4	Subject	Email subject
5	Person Name	Person name if inferred or from lookup
6	Org Name	Organization name if available
7	Determination	Pipeline determination label
8	Status with Org	Person/org status string when set (`person_status` on the model)
9	Multipub Deferred	`Yes` / `No` — inactive path deferred by active Multipub subscription gate
10	Multipub Review Reason	Multipub validation text when present
11	HODOR ProsNums	Comma-separated Hodor prospect numbers if found without Cupola
12	Multipub Subsnum	Subscriber number if found
13	Salesforce IDs	Comma-separated Salesforce Lead/Contact identifiers if found
14	Message ID	Original message id for traceability

19. `output_document_inactive_no_cupola_match.json`#

Purpose#

JSON companion to the IP4 no-Cupola handoff CSV.

Generation Conditions#

Generated alongside the CSV when inactive-no-Cupola-match records exist.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

Field	Type	Description
list_name	string	`"List of Inactive People with No Cupola Match"`
purpose	string	`"IP4 handoff list for CUPOLA-undetermined cases — inactive with no Cupola match, active with no Cupola row, and reactivation candidates (inactive Cupola row on an ACTIVE determination)"`
generated_at	string (ISO 8601)	When the file was written
record_count	integer	Number of records
records	array of objects	`InactiveNoCupolaMatchRecord` fields in snake_case

20. `cupola_audit_log.csv`#

Purpose#

A dedicated audit trail for all changes made (or planned) in the CUPOLA contact management system during a run. This file documents every status change (marking contacts active/inactive) and every new contact addition, providing a complete record for compliance, rollback, and operational review purposes.

Generation Conditions#

Always generated every run: CupolaAuditLogger.write_audit_log() writes header-only CSV and an empty entries array in JSON when no Cupola actions were logged.

Format#

CSV with UTF-8 BOM encoding. All fields quoted.

Column Reference#

#	Column Header	Description
1	Timestamp	The exact date and time (ISO 8601, Eastern Time) when this CUPOLA action was recorded.
2	Action Type	The type of CUPOLA operation. Values: `status_change` (an existing contact's active/inactive status was changed), `contact_addition` (a new contact was added to CUPOLA).
3	Contact ID	The CUPOLA `org_person_id` for the affected contact. For status changes, this is the existing contact ID. For contact additions, this is the newly assigned ID (if available) or empty if the addition was mocked.
4	Email	The email address of the contact being modified or added.
5	Name	The name of the contact. Empty if not available.
6	Org Name	The organization name associated with the contact. Empty if not available.
7	Requested Status	For `status_change` entries: `Yes` if the contact was being set to ACTIVE, `No` if being set to INACTIVE. Empty for `contact_addition` entries.
8	Previous Status	The `link_org_person.status` value captured immediately before the UPDATE via SQL `OUTPUT deleted.status`. `1` = active, `0` = inactive. Empty for contact additions, recommendation-only rows, or read-only mode where the value cannot be observed.
9	Auto Applied	`Yes` when the change was actually executed against CUPOLA via `cupola.update_contact_status_with_audit` (i.e. `CUPOLA_AUTOMATIC_UPDATES=true`). `No` when the audit row records a recommendation only (sent to Venu).
10	Update Succeeded	`Yes` / `No` when `Auto Applied=Yes` to record whether the SQL UPDATE returned success. Empty for recommendation-only rows.
11	Reason	The reason for the status change (e.g., `Person left company per auto-response`, `Inactive determination from LLM classification`). Empty for contact additions.
12	Determination	The pipeline determination that triggered this action (e.g., `inactive`, `active`, `replacement`).
13	Email Source	The source email address from the auto-response that initiated this action. This is the original auto-response sender, linking the audit entry back to the triggering email. Empty for contact additions.
14	Title	The job title of the contact. Only populated for `contact_addition` entries where a title was available. Empty for status changes.

21. `cupola_audit_log.json`#

Purpose#

JSON companion to the CUPOLA audit CSV. Contains the same data in structured format for programmatic consumption.

Generation Conditions#

Generated alongside the CSV on every run (empty entries when nothing was logged).

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

Field	Type	Description
generated_at	string (ISO 8601)	Timestamp when this file was generated.
entry_count	integer	Total number of audit log entries.
entries	array of objects	Each object represents one CUPOLA action. Fields match the CSV columns using snake_case keys: `timestamp`, `action_type`, `contact_id`, `email`, `name`, `org_name`, `requested_status` (boolean for status changes), `previous_status` (integer 0/1 or `null`), `auto_applied` (boolean), `update_succeeded` (boolean or `null`), `reason`, `determination`, `email_source`, `title`. Note: in JSON, boolean fields are `true`/`false`/`null` rather than the `Yes`/`No`/empty string used in CSV.

22. `cupola_audit_log_rollback_plan.csv`#

Purpose#

A revertible record of every CUPOLA status_change that was actually executed (Auto Applied=Yes) during the run. Generated by CupolaAuditLogger._write_rollback_plan (src/auto_responder/utils/cupola_audit_logger.py) so an operator can roll the batch back with simple SQL if a problem is discovered after the fact.

Generation Conditions#

Generated when at least one audit entry has auto_applied=True and update_succeeded is not False. Not written when:

the run only emitted recommendations (CUPOLA_AUTOMATIC_UPDATES=false), or
every auto-applied UPDATE failed, or
there were no Cupola actions at all.

Format#

CSV with UTF-8 BOM encoding. All fields quoted (csv.DictWriter with QUOTE_ALL).

Column Reference#

#	Column Header	Description
1	Timestamp	When the original update was logged (ISO 8601, Eastern Time).
2	Contact ID	CUPOLA `org_person_id` that was updated.
3	Email	Contact email at the time of update.
4	Name	Contact name when known.
5	Org Name	Organization name when known.
6	Applied Status	The integer status that was written by the run. `1` if the contact was set ACTIVE, `0` if INACTIVE.
7	Previous Status	The integer status captured immediately before the UPDATE, sourced from `OUTPUT deleted.status`. The literal string `MISSING` appears when the previous value could not be observed (read-only wrapper, mock connector, etc.).
8	Rollback SQL	A single ready-to-run statement that inverts the change, e.g. `UPDATE link_org_person SET status = 1 WHERE org_person_id = '<id>';`. When `Previous Status` is `MISSING`, this column contains a `-- MANUAL: previous status unknown` comment instead.
9	Reason	Same reason text recorded in `cupola_audit_log.csv`.
10	Determination	Pipeline determination that drove the action (e.g. `inactive`, `active`).

Operational notes#

The plan is written next to cupola_audit_log.csv / .json in the same run directory.
Rows that hit the MISSING marker should be triaged before running their SQL — the run wrote them without observing the prior state, which usually means a mock or read-only wrapper was active.
The plan is regenerated per run; older plans are not garbage-collected.

23. `output_document_multipub_audit.csv`#

Purpose#

Per-row Multipub validation audit for every INACTIVE determination that was checked against the Multipub subscription gate. Written to the run directory for engineers; not emailed (Tarun receives notify_tarun_undetermined_sender_review only). After review, Tarun may post files back through POST /multipub/upload (Yes → notify_multipub_subscriber_followup_from_upload to Angel/Yogesh).

Generation Conditions#

Generated when at least one MultipubAuditRecord was collected during the run (OutputDocumentCollector.multipub_audit). Both deferred and non-deferred inactive paths produce a row when Multipub validation runs.

Format#

CSV with UTF-8 BOM encoding. Booleans rendered as Yes / No (via _sanitize_for_csv). All fields quoted.

Column Reference#

Headers come from OutputDocumentGenerator.generate_multipub_audit_csv (src/auto_responder/utils/output_document_generator.py).

#	Column Header	Description
1	Id	Record identifier (8-char UUID slice).
2	AccountName	Inbox / account source the auto-response landed in.
3	Email Received From	Email address used for the Multipub lookup (post relay normalization).
4	Person Name	Person name resolved by the contact lookup; empty if not known.
5	Org Name	Organization name resolved by the contact lookup; empty if not known.
6	Determination	Determination label (e.g. `inactive`).
7	Multipub Subsnum	Matched Multipub subscriber number; empty when no Multipub record was found.
8	Has Active Subscription	`Yes` when `MultipubValidationResult.has_active_subscription` is true.
9	Active Order Count	Number of currently-active subscription orders returned by Multipub.
10	Has Recently Expired	`Yes` when at least one recently-expired subscription was found.
11	Recently Expired Order Count	Number of recently-expired orders returned.
12	Has Recent Single-Issue	`Yes` when at least one recent single-issue purchase was found.
13	Recent Single-Issue Order Count	Number of recent single-issue orders returned.
14	Flagged for Review	`Yes` when the validation gate flagged the row (typically equals `Has Active Subscription` OR a review-worthy non-active subscription).
15	Inactive Action Deferred	`Yes` when the inactive workflow was held back because of an active Multipub subscription. `No` for clean inactive rows that proceeded.
16	Review Reason	Free-text reason from `MultipubValidationResult.review_reason`. Empty when not flagged.
17	Summary	Single-line summary string from `MultipubValidationResult.get_summary()`.
18	Message ID	Original message ID for traceability.

24. `output_document_multipub_audit.json`#

Purpose#

JSON companion to the Multipub audit CSV — same data, structured for programmatic consumption.

Generation Conditions#

Generated alongside the CSV whenever Multipub audit records exist.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

Field	Type	Description
list_name	string	Always `"Multipub Audit (Tarun handoff)"`.
purpose	string	Describes the deliverable as a per-row Multipub validation audit for INACTIVE determinations.
generated_at	string (ISO 8601)	Timestamp when the file was written.
record_count	integer	Number of audit rows.
records	array of objects	Full serialization of `MultipubAuditRecord` Pydantic models with snake_case keys (booleans, not Yes/No).

25. `output_document_email_update_requests.csv`#

Purpose#

Per-row deliverable for the Changed Email category. Written to the run directory; not bundled into marketing emails (N04 attaches only *_NoLongerThere_*.csv suppression imports from inactive people). Replaces the historical "12Feb-10Mar Email Update Requests" manual export.

Generation Conditions#

Generated by ReportGenerator.write_email_update_requests_deliverable when at least one processed email maps to the Changed Email main category. When zero rows qualify, the file is skipped and a single INFO log line is emitted.

Format#

CSV with UTF-8 BOM encoding. All fields quoted.

Column Reference#

#	Column Header	Description
1	Email ID	Source `message_id` of the auto-response.
2	Sender Email	Original sender address (post relay normalization).
3	Lookup Email	Address actually used for backend lookup (signature / NDR target / sender, in that priority order).
4	Contact Found	`Yes` / `No` — whether any backend system returned a contact.
5	Contact Systems	Comma-separated list of systems that matched (e.g. `Cupola, Hodor`).
6	Determination	Pipeline determination label (typically `email_update`).
7	Status	Per-email processing status (`success`, `skipped_*`, etc.).
8	Org Name	Organization resolved by lookup; empty if not found.
9	Person Name	Resolved person name (falls back to sender name when needed).
10	Sender New Email	The new email address extracted from the auto-response body, if surfaced by the classifier.
11	CUPOLA Org ID	Cupola Org ID when matched.
12	CUPOLA Org Person ID	Cupola org-person link ID when matched.
13	CUPOLA Person ID	Cupola person ID when matched.
14	HODOR ProsNum	Hodor `pros-num` when matched.
15	Multipub SubsNum	Multipub subscriber number when matched.

26. `output_document_email_update_requests.json`#

Purpose#

JSON companion to the email-update-requests CSV.

Generation Conditions#

Generated alongside the CSV when Changed Email rows exist.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

Field	Type	Description
list_name	string	Always `"Email update requests (Changed Email)"`.
purpose	string	Always `"Address corrections for SFMC / marketing systems"`.
generated_at	string (ISO 8601)	Timestamp when the file was written.
record_count	integer	Number of records.
records	array of objects	Mirror of the CSV columns using snake_case keys (`email_id`, `sender_email`, …).

27. `action_log.log`#

Purpose#

A verbose execution log that tracks every individual operation (database lookups, updates, notifications, LLM calls) in detail, primarily used during dry-run and read-only modes. This file shows exactly what the system would do (or did do) for each email, including mock operations that simulate real actions. It serves as the definitive record of operational intent and is particularly valuable for validating pipeline behavior before switching to live mode.

Generation Conditions#

Generated when the pipeline runs in dry-run mode or read-only mode. Not generated in full live mode. Created at the start of the run.

Format#

Plain text with timestamped entries.

Structure#

================================================================================
DRY-RUN EXECUTION LOG
Started: {ISO 8601 timestamp}
================================================================================

Entry Types#

Each entry is timestamped with [HH:MM:SS] in Eastern Time.

Email Processing Start:

[HH:MM:SS] EMAIL PROCESSING: {email_id} from {sender_email}
[HH:MM:SS]   Subject: {subject (truncated to 100 chars)}

Contact Lookup:

[HH:MM:SS] CONTACT LOOKUP: {email}
[HH:MM:SS]   [MOCK] {System}: Found contact {contact_id}
[HH:MM:SS]   [MOCK] {System}: Not found

LLM Classification:

[HH:MM:SS]   LLM Classification: {category} (confidence: {confidence})
[HH:MM:SS]     Extracted new email: {new_email}
[HH:MM:SS]     Extracted alternate contact: {contact_info}
[HH:MM:SS]     Extracted personal email: {personal_email}

Determination:

[HH:MM:SS]   Determination: {determination} (confidence: {score})

Database Updates (mocked):

[HH:MM:SS]   [MOCK] Would {operation} in {System} for {contact_id} ({key=value, ...})

Notifications (mocked):

[HH:MM:SS]   [MOCK] Would send notification: {type}
[HH:MM:SS]     To: {recipient}
[HH:MM:SS]     Subject: {subject}

Action Execution:

[HH:MM:SS] ACTION EXECUTION: Determination={determination} for {email}

Email Completion:

[HH:MM:SS]   Email processing {SUCCESS|FAILED} for {email}

Summary Section (appended at end of run)#

================================================================================
SUMMARY
================================================================================
Total Emails Processed: {count}

Determinations:
  - {type}: {count}

Database Operations (would be performed):
  - {System}: {count} {operation_type}, {count} {operation_type}

Notifications (would be sent):
  - {type}: {count}

LLM Classification Calls: {count}
Execution Duration: {seconds} seconds
Completed: {ISO 8601 timestamp}
================================================================================

Summary Fields#

Field	Description
Total Emails Processed	Number of emails that went through the full processing pipeline.
Determinations	Breakdown of determination types and their counts (e.g., `inactive: 5`, `active: 2`, `unknown: 3`).
Database Operations	Per-system breakdown of all database operations that would be performed (in live mode) or were mocked. Grouped by system (Cupola, Hodor, Salesforce, Multipub) with operation counts (e.g., `lookups`, `update_status`, `add_contact`).
Notifications	Count of each notification type that would be sent (e.g., alerts to Max/Client Services about active subscriptions).
LLM Classification Calls	Total number of LLM API calls made during classification.
Execution Duration	Total wall-clock time for the entire run in seconds.

28. `batch_report.html`#

Purpose#

A self-contained, visually rich HTML dashboard summarizing the entire batch run. Designed for browser viewing and sharing with stakeholders. Features interactive Plotly charts, KPI cards, per-email detail tables, and links to the output document files. This is the most polished and accessible output artifact, suitable for non-technical audiences.

Generation Conditions#

Generated when at least one email is processed.

Format#

Single HTML file with embedded CSS. Uses the Plotly JavaScript library via CDN (https://cdn.plot.ly/plotly-2.27.0.min.js) for interactive charts and Google Fonts (Outfit, IBM Plex Mono, IBM Plex Sans) for typography. Dark theme (slate/charcoal background with sky-blue and teal accents).

Sections#

Run Overview (KPI Cards)#

Metric	Description
Mode	The run mode: `DRY-RUN (all connections mocked)`, `READ-ONLY (live reads, writes mocked)`, or `LIVE`.
Total Emails	Number of emails processed in this batch.
Duration	Total run time in seconds.
Action Success Rate	Percentage of successfully completed actions out of total actions attempted.
Successful	Count of emails that completed with `success` status.
Failed	Count of emails with `failed` status.
Skipped (no contact)	Count of emails where contact was not found in any system.
Skipped (unknown)	Count of emails with `unknown` determination (no action needed).
Errors	Count of emails that encountered unexpected errors.
Deferred (Multipub)	Count of emails where inactive marking was halted due to active Multipub subscriptions.
QA Corrections	Number of times the QA agent changed the initial LLM classification.
Multipub Validated	Number of emails that underwent Multipub subscription validation.
Multipub Deferred	Number of emails deferred due to active Multipub subscriptions (same as Deferred above).

Output Document Counts#

Metric	Description
Inactive People	Number of records in the inactive people output document.
Alternate Contacts	Number of records in the alternate contacts output document.
Inactive at New Org	Number of records in the inactive-at-new-org output document.

Visual Analysis (Interactive Charts)#

Chart	Type	Description
Determination Breakdown	Donut/pie chart	Distribution of determination types (INACTIVE, ACTIVE, REPLACEMENT, UNKNOWN, etc.) across all processed emails.
Outcome Status Distribution	Horizontal bar chart	Count of each processing status (Success, Failed, Skipped No Contact, Skipped Unknown, Error, Deferred Multipub).
LLM Category Breakdown	Vertical bar chart	Count of emails per LLM classification category (undeliverable, left company, retired, deceased, out of office, changed email, N/A).
Actions by System	Stacked bar chart	Count of succeeded vs. failed actions per backend system (Cupola, Hodor, Salesforce, Multipub).

In-Depth Analysis (Per-Email Table)#

Column	Description
#	Sequential row number.
Sender	Sender's email address (monospaced).
Subject	Email subject, truncated to 60 characters with `...` if longer.
Determination	Determination type in uppercase.
Confidence	Confidence score as percentage.
Status	Processing status in title case (spaces replace underscores).
Actions	Summary of up to 5 actions in format `[OK/FAIL] system: operation`. Shows `+N more` if additional actions exist. Shows `—` if no actions.
Error	Error message text, or `—` if no error.

Output Documents (Links)#

Provides download links (relative file paths) to the three output document pairs:

Inactive People (CSV · JSON)
Alternate Contacts (CSV · JSON)
Inactive at New Org (CSV · JSON)

Note: Undeliverables are generated as separate files but are not linked from the HTML report.

29. `batch_report.pptx`#

Purpose#

A PowerPoint presentation summarizing the batch run for executive review or team meetings. Contains approximately 10 slides covering KPIs, determination breakdowns, outcome status, LLM category analysis, actions by system, confidence and quality metrics, per-email summary tables, and output document counts.

Generation Conditions#

Generated alongside batch_report.html when at least one email is processed.

Format#

PowerPoint .pptx file generated using the python-pptx library.

Slides#

Slide	Content
Title Slide	Report title with generation date and run window.
Executive Summary KPIs	Total emails, duration, action success rate, key outcome counts.
Determination Breakdown	Chart and counts of each determination type.
Outcome Status	Distribution of processing statuses.
LLM Category Analysis	Breakdown of LLM classification categories.
Actions by System	Success/failure counts per backend system.
Confidence & Quality	QA correction rate, average confidence, Multipub validation stats.
Per-Email Summary	Table(s) listing each email with sender, subject, determination, status.
Output Documents	Counts and summaries for the three output document lists (inactive people, alternate contacts, inactive at new org).

30. `output_document_human_review.csv` / `.json`#

Purpose#

Consolidated Human Review digest introduced by the active-only automation policy. Captures every row that the pipeline refused to act on automatically so IP4 / operations can triage manually. Written by OutputDocumentCollector.add_human_review. Actionable rows ride in notify_sai_action_items; metadata (counts + reason legend) is included in notify_venu_cupola_audit_files.

Generation Conditions#

Generated whenever OutputDocumentCollector.human_review is non-empty. Rows are added by ActionEngine from several handlers:

`reason` constant	When
`HUMAN_REVIEW_REASON_ACTIVE_NEW_CONTACT`	ACTIVE outcome but no CUPOLA row — no auto-add.
`HUMAN_REVIEW_REASON_REACTIVATION_CANDIDATE`	ACTIVE outcome but matched CUPOLA row is inactive — no auto-reactivate.
`HUMAN_REVIEW_REASON_UPDATE_ON_INACTIVE`	EMAIL_UPDATE / TITLE_UPDATE on inactive CUPOLA row (active-only gate blocked it).
`HUMAN_REVIEW_REASON_OUT_OF_OFFICE`	OUT_OF_OFFICE determination — tracked separately, no system writes.
Existing reasons (UNKNOWN, bounce triage, replacement parse fallback, etc.)	Already collected from previous phases.

Format#

CSV with UTF-8 BOM encoding; JSON with 2-space indentation. All fields quoted.

Column Reference (CSV)#

Headers come from output_document_generator.py (generate_human_review_csv). Column titles use spaced words (e.g. Sender Email, Lookup Email).

Column	Description
ID	Record identifier (8-char UUID slice).
Account Name	Inbox / account source.
Message ID	Original message id for traceability.
Sender Email	Sender of the auto-response.
Lookup Email	Email used after normalization for contact lookup.
Subject	Email subject.
Email Body	Full raw body of the source email (plain text or HTML as stored).
Reason	One of the `HUMAN_REVIEW_REASON_*` constants listed above.
Reason Detail	Human-readable explanation of why the pipeline deferred.
Determination	Determination label at the time of routing.
LLM Category	Normalized classifier category when available.
Confidence	LLM confidence when available.
Person Name / Org Name	When available.
CUPOLA OrgPerson IDs / HODOR ProsNums / Multipub Subsnum / Salesforce IDs	Resolved identifiers when known.
Suggested Action	Recommended next step for the reviewer.
Notes	Free-form pipeline notes.

JSON Top-Level Fields#

Field	Type	Description
`list_name`	string	`"Human Review digest"`
`purpose`	string	Describes the file as the consolidated human-review queue.
`generated_at`	string (ISO 8601)	Timestamp.
`record_count`	integer	Number of rows.
`records`	array	Full serialization of each review record.

31. `impact_report.txt` / `.json`#

Purpose#

Per-run headline summary introduced by the active-only automation policy. Produced by utils/impact_report.py and attached inline to Notifier.notify_run_audit_for_ip4 (Sai-only run audit).

Generation Conditions#

Always written at end of run (after the CUPOLA audit logger finishes flushing). The counts are derived from the in-memory CupolaAuditLogger.entries list, so read-only mode and dry-run runs still emit the report (counts are zero when no writes occurred).

Format#

impact_report.txt — plain text with three labelled counts, one per line.
impact_report.json — structured object with the same counts plus a timestamp.

Fields#

Field	Type	Description
`emails_processed`	integer	Total auto-response emails handled in the run.
`records_deactivated`	integer	CUPOLA rows flipped to inactive — `status_change` audit entries with `requested_status=False` and `auto_applied=True`.
`records_added`	integer	New CUPOLA rows inserted — `contact_addition` audit entries with a non-empty `contact_id`. Only ticks for REPLACEMENT when `CUPOLA_AUTO_ADD_REPLACEMENTS=true`.
`generated_at`	string (ISO 8601)	Timestamp the report was written (JSON only).

32. `action_items_tracker.csv` (cross-run)#

Purpose#

Central queue of one row per action notification email sent in a run (N01–N07). Appended once per run by append_action_items_for_run after all notifications complete. Each row includes ActionItemCount and a per-attachment breakdown in Summary (artifact line counts, not separate tracker rows). Default path: {REPORT_OUTPUT_DIR}/action_items_tracker.csv; override with ACTION_ITEMS_TRACKER_PATH. Post-run completion requests use catalog N12 via auto-responder-request-action-item-confirmation (not appended to this CSV); N12 bodies use collect_action_item_detail_rows for the same counts.

Generation Conditions#

Skipped when the run folder RunId already exists in the tracker (idempotent re-run/resend). New rows are appended with Completed=false for manual spreadsheet triage.

Columns#

Column	Description
Completed	First column for spreadsheet triage. `false` on append; operators set `true` when the notification owner confirms work.
NotificationTo	Configured SMTP To recipient(s) for that catalog notification (comma-separated when multiple, e.g. N02 Angel + Yogesh).
RunId	Run folder name (e.g. `run_2026-05-26_14-30-00`).
RunTimestamp	Parsed from folder name when possible.
NotificationId	N01–N07 (N05 vs N06 follows Sai bundle logic).
ActionItemCount	Number of actionable lines in attached CSVs for that notification.
SourceFiles	Semicolon-separated list of run artifacts that contributed rows.
Summary	Per-file counts (e.g. `output_document_alternate_contacts.csv: 67; …`).
CompletedAt / Notes	Empty on append; manual follow-up.

Glossary of Systems#

System	Full Name	Description
CUPOLA	CUPOLA Contact Management	Thompson's primary contact and organization management system. Stores person records, organization records, and org-person links. The system of record for contact active/inactive status.
HODOR	Hodor / dmorders_thompson	Thompson's prospect/subscriber database. Contains prospect numbers (`ProsNum`), email records, and subscription metadata. Contacts can be marked "No Longer with Firm" when inactive.
SFMC	Salesforce Marketing Cloud	Email marketing platform. The Auto Suppression List prevents marketing emails from being sent to inactive/invalid addresses.
Multipub	MultiPub Subscription Management	Publication subscription and order management system. Tracks active subscriptions, expired orders, and single-issue purchases. Used to validate whether an inactive person still has live subscription activity before marking them inactive.
Salesforce	Salesforce CRM	Customer relationship management system. Contains Lead and Contact records. Updated when contact status changes (if not related to Multipub).

Glossary of Determination Types#

Determination	Description
inactive	Person has permanently left the organization (left company, retired, or deceased). All contact records across systems should be marked inactive/suppressed.
active	Person is confirmed active at their organization. When no CUPOLA row exists or the matched row is inactive, the pipeline refuses to auto-add / auto-reactivate and routes to Human Review; mirror systems (Hodor, non-Multipub Salesforce) are still updated as before.
replacement	A replacement/alternate contact was identified. The original person is marked inactive and the replacement row is captured for IP4 review (auto-add disabled unless `CUPOLA_AUTO_ADD_REPLACEMENTS=true`).
title_update	Person's job title has changed. Gated on active CUPOLA row — when the gate fails the entire update is blocked and routed to Human Review.
email_update	Person's email address has changed. Gated on active CUPOLA row — when the gate fails the entire update is blocked and routed to Human Review.
out_of_office	Auto-reply is a temporary absence notification. Promoted to a first-class determination by the performs no system writes and emits a Human Review row with `HUMAN_REVIEW_REASON_OUT_OF_OFFICE`.
unknown	Email is not relevant (spam, unrelated content) or cannot be classified. No action is taken.

Glossary of Processing Statuses#

Status	Description
success	All planned actions completed successfully.
failed	One or more actions failed during execution.
skipped_no_contact	Contact was not found in any backend system — no actions could be taken.
skipped_unknown	Determination was `unknown` — no actions were needed.
error	An unexpected error occurred during processing (e.g., network failure, unhandled exception).
deferred_multipub	Inactive marking was halted because the person has active subscriptions in Multipub. Requires manual review.
pending	Processing has not yet completed. Should not appear in final reports.

Maintaining this document#

Edit docs/DATA_DICTIONARY.html directly. Preview locally from the repo root:

bash

python scripts/serve_data_dictionary.py

Then open http://127.0.0.1:8765/DATA_DICTIONARY.html in a browser.

AutoResponderProcess — Output Files Data Dictionary#

Table of Contents#

1. run.log#

Purpose#

Generation Conditions#

Format#

Field Descriptions#

Notes#

2. stage_execution.log#

Purpose#

Generation Conditions#

Format#

Structure#

Header#

Per-Stage Section#

STAGE_DATA JSON Fields#

Final Summary Section#

3. processing_report.log#

Purpose#

Generation Conditions#

Format#

Sections#

Header#

Per-Email Block (repeated for each email)#

Summary Section#

Output Document Lists (appended if data exists)#

Output replay (regenerated/)#

4. processing_report.json#

Purpose#

Generation Conditions#

Format#

Top-Level Fields#

Record Fields (each object in records array)#

Email Identification Fields#

Contact Lookup Fields#

Determination Fields#

Multipub Validation Fields#

Raw LLM Output Fields#

Actions Fields#

Outcome Fields#

5. processing_report_master.csv and processing_report_ip4.csv#

processing_report_master.csv#

Purpose#

Generation Conditions#

Format#

Column Reference#

processing_report_ip4.csv#

Purpose#

Generation Conditions#

Row filter#

Format#

Column Reference (fixed order — 23 columns)#

6. category_summary_report.csv#

Purpose#

Generation Conditions#

Format#

Category Mapping#

Column Reference#

7. category_summary_report.json#

Purpose#

Generation Conditions#

Format#

Top-Level Fields#

8. classifier_output/classifier_output.json#

Purpose#

Generation Conditions#

Format#

Top-Level Fields#

Record Fields#

9. classifier_output/classifier_output.csv#

Purpose#

Generation Conditions#

Format#

Column Reference#

10. output_document_inactive_people.csv#

Purpose#

Generation Conditions#

Format#

Column Reference#

11. output_document_inactive_people.json#

1. `run.log`#

2. `stage_execution.log`#

3. `processing_report.log`#

Output replay (`regenerated/`)#

4. `processing_report.json`#

Record Fields (each object in `records` array)#

5. `processing_report_master.csv` and `processing_report_ip4.csv`#

`processing_report_master.csv`#

`processing_report_ip4.csv`#

6. `category_summary_report.csv`#

7. `category_summary_report.json`#

8. `classifier_output/classifier_output.json`#

9. `classifier_output/classifier_output.csv`#

10. `output_document_inactive_people.csv`#

11. `output_document_inactive_people.json`#

Marketing suppression deliverable (`{BusinessUnit}_NoLongerThere_{YYYY-MM-DD}.csv`)#

12. `output_document_alternate_contacts.csv`#

13. `output_document_alternate_contacts.json`#

14. `output_document_inactive_new_org.csv`#

15. `output_document_inactive_new_org.json`#

16. `output_document_undeliverables.csv`#

17. `output_document_undeliverables.json`#

18. `output_document_inactive_no_cupola_match.csv`#

19. `output_document_inactive_no_cupola_match.json`#

20. `cupola_audit_log.csv`#

21. `cupola_audit_log.json`#

22. `cupola_audit_log_rollback_plan.csv`#

23. `output_document_multipub_audit.csv`#