AutoResponderProcess — Output Files Data Dictionary#
See repo source for current behavior (ReportGenerator._CSV_COLUMNS / _CSV_COLUMNS_IP4, output_document.py).
This document provides a comprehensive reference for every output file produced by the AutoResponderProcess pipeline. Each file is described with its purpose, generation conditions, format, and a detailed explanation of every column, field, or structural element it contains.
All output files are written to a timestamped run directory:
processing_reports/run_{YYYY-MM-DD_HH-MM-SS}/Table of Contents#
- run.log
- stage_execution.log
- processing_report.log
- processing_report.json
- processing_report_master.csv / processing_report_ip4.csv
- category_summary_report.csv
- category_summary_report.json
- classifier_output/classifier_output.json
- classifier_output/classifier_output.csv
- output_document_inactive_people.csv
- output_document_inactive_people.json
- Marketing suppression deliverable ({BusinessUnit}_NoLongerThere_{date}.csv)
- output_document_alternate_contacts.csv
- output_document_alternate_contacts.json
- output_document_inactive_new_org.csv
- output_document_inactive_new_org.json
- output_document_undeliverables.csv
- output_document_undeliverables.json
- output_document_inactive_no_cupola_match.csv
- output_document_inactive_no_cupola_match.json
- cupola_audit_log.csv
- cupola_audit_log.json
- cupola_audit_log_rollback_plan.csv
- output_document_multipub_audit.csv
- output_document_multipub_audit.json
- output_document_email_update_requests.csv
- output_document_email_update_requests.json
- action_log.log
- batch_report.html
- batch_report.pptx
- output_document_human_review.csv / .json
- impact_report.txt / .json
1. run.log#
Purpose#
The primary runtime log file for the entire pipeline execution. Captures every log message emitted by any Python logger during the run at the DEBUG level and above. This is the most granular diagnostic artifact and is the first place to look when troubleshooting unexpected behavior, errors, or performance issues.
Generation Conditions#
Always generated. Created at the start of every run via setup_logging() in logging_config.py.
Format#
Plain text. Each line follows the enhanced logging format:
{timestamp} - [{correlation_id}] - {logger_name} - {level} - {message}Field Descriptions#
| Field | Description |
|---|---|
| timestamp | The date and time the log entry was recorded, formatted as YYYY-MM-DD HH:MM:SS in US Eastern Time (America/New_York). All timestamps throughout the application are normalized to Eastern Time for consistency. |
| correlation_id | An 8-character UUID prefix that uniquely identifies a logical unit of work (typically one email being processed). This allows you to trace all log messages related to a single email across multiple modules and subsystems. Displays N/A when no correlation context is active (e.g., during initialization). |
| logger_name | The fully qualified Python module name that emitted the log entry (e.g., auto_responder.connectors.cupola_connector). This tells you exactly which component of the system generated the message. |
| level | The severity level of the log entry. In the file handler, all levels from DEBUG upward are captured. Possible values: DEBUG (detailed diagnostic information), INFO (general operational messages), WARNING (unexpected but recoverable situations, including slow-operation alerts for functions exceeding 1000ms), ERROR (failures that prevented an operation from completing), CRITICAL (severe failures that may halt the entire run). |
| message | The free-form log message content. May include structured data such as email addresses, contact IDs, system names, operation results, timing information, and error tracebacks. For operations decorated with @log_performance, a [duration=X.XXms] suffix is appended when the function completes. |
Notes#
- The file handler is set to
DEBUGlevel, which is more verbose than the console handler (set toINFO). This means the file will contain detailed diagnostic information not shown in the terminal. - Noisy third-party libraries (
httpx,urllib3,msal) are suppressed toWARNINGlevel to keep the log focused on application-level events.
2. stage_execution.log#
Purpose#
A structured, stage-by-stage execution log that tracks the pipeline's progression through its major processing phases. Unlike run.log which captures every message, this file is organized into discrete stage sections with JSON-encoded data blocks, making it ideal for programmatic post-run analysis and pipeline health monitoring.
Generation Conditions#
Always generated. Created at run start by the StageLogger class. The final summary section is written when the pipeline completes (normal or early exit).
Format#
Plain text with embedded JSON blocks. The file is divided into:
- A header section
- One section per pipeline stage
- A final summary section
Structure#
Header#
====================================================================================================
AUTORESPONDER PROCESS - STAGE EXECUTION LOG
Run Started: {ISO 8601 timestamp}
====================================================================================================Per-Stage Section#
Each stage that executes during the pipeline gets its own section:
====================================================================================================
STAGE: {stage_name}
Timestamp: {ISO 8601 timestamp}
----------------------------------------------------------------------------------------------------
STAGE_DATA (JSON):
{JSON object with stage metadata, timing, and data}
SUMMARY:
Duration: {X.XX}ms ({X.XX}s)
Status: {completed|failed|skipped}
Emails Processed: {count} (if applicable)
Error: {error message} (if applicable)
Details: {JSON details} (if applicable)
====================================================================================================STAGE_DATA JSON Fields#
| Field | Type | Description |
|---|---|---|
| stage_name | string | The internal name of the pipeline stage (e.g., STEP_1_EXTRACT_EMAILS, STEP_2_CONTACT_LOOKUP, STEP_3_CLASSIFY, STEP_4_DETERMINE, STEP_5_EXECUTE_ACTIONS, STEP_6_GENERATE_REPORTS). Identifies which phase of the processing pipeline this section documents. |
| start_time | string (ISO 8601) | The exact timestamp when this stage began execution, in Eastern Time. Used together with end_time to compute the stage's wall-clock duration. |
| end_time | string (ISO 8601) | The exact timestamp when this stage completed execution. |
| duration_ms | float | The elapsed wall-clock time for the stage in milliseconds. Computed as the difference between end_time and start_time. Useful for identifying performance bottlenecks — for example, a slow LLM classification stage or a slow database lookup. |
| status | string | The outcome of the stage. completed means the stage finished without fatal errors. failed means the stage encountered an unrecoverable error. skipped means the stage was intentionally bypassed (e.g., no emails to process). |
| metadata | object | Additional key-value pairs provided when the stage was started. Content varies by stage and may include configuration parameters, input counts, or other contextual information. |
| data | object | Arbitrary structured data logged during stage execution via log_stage_data(). Each key represents a named data point; the value can be any JSON-serializable structure. Examples include email counts, lookup results, classification summaries, or action execution details. |
| errors | array | List of error objects recorded during the stage. Each error object contains: error (the error message string), error_type (the Python exception class name, e.g., ConnectionError, ValueError), context (additional key-value context about the error), and timestamp (when the error occurred). |
| warnings | array | List of warning objects recorded during the stage. Each warning object contains: warning (the warning message string), context (additional key-value context), and timestamp (when the warning was recorded). Warnings indicate non-fatal issues that may merit attention but did not prevent stage completion. |
Final Summary Section#
FINAL SUMMARYContains a SUMMARY_DATA (JSON) block and a HUMAN-READABLE SUMMARY.
| Field | Type | Description |
|---|---|---|
| run_start_time | string (ISO 8601) | The timestamp when the entire pipeline run began. |
| run_end_time | string (ISO 8601) | The timestamp when the pipeline run completed. |
| total_duration_ms | float | Total wall-clock time for the entire run in milliseconds. |
| total_duration_seconds | float | Total wall-clock time in seconds (convenience field). |
| stages_completed | integer | The number of stages that were executed during the run. |
| statistics | object | Aggregated statistics across all stages. Contains: total_emails_extracted (number of emails pulled from the inbox), unique_emails (number of deduplicated emails), emails_processed (number of emails that went through the full pipeline), determinations (a dictionary mapping each determination type to its count), errors (array of all errors across all stages), warnings (array of all warnings across all stages). |
| stage_summaries | array | A compact array summarizing each stage. Each entry contains stage_name, status, and duration_ms. This provides a quick-glance view of which stages ran and how long each took. |
3. processing_report.log#
Purpose#
The primary human-readable processing report. Provides a comprehensive, formatted text summary of every email that was processed, including the contact lookup results, LLM classification, determination, Multipub validation, standard actions, executed actions, and final outcome. This is the main report for operational review of a batch run.
Generation Conditions#
Generated when at least one email is processed. Not generated if the pipeline finds zero emails in Step 1 (early exit).
Format#
Plain text, structured with fixed-width formatting and separator lines. The report has four major sections: Header, Per-Email Details, Summary, and Output Document Lists.
Sections#
Header#
- Generated: Timestamp when the report was generated (format:
YYYY-MM-DD HH:MM:SS {timezone}) - Mode: The run mode —
DRY-RUN (all connections mocked),READ-ONLY (live reads, write operations MOCKED), orLIVE - Run window: Start time through end time with total duration in seconds
- Emails: Total number of emails processed in this run
Per-Email Block (repeated for each email)#
Each email gets a detailed block with these subsections:
Email Identification:
- Email ID: The unique message identifier from the email system (typically the database
Idcolumn from the Hodordmorders_thompsontable) - From: The sender's display name and email address in format
Name <email@domain.com> - Subject: The full email subject line
- Received: The date and time the email was received
- Inbox: Which inbox/account the email was fetched from (e.g.,
energy@thompson.com,grants@thompson.com) - Body: A preview of the email body text, truncated to 500 characters with total character count shown if truncated. Newlines are collapsed to spaces for readability.
Contact Lookup:
- Contact: Whether the contact was found and in how many systems, with the list of systems. Systems that were mocked in the current run are annotated with
(mock). The count is broken down into live vs. mock counts. - Sources (all queried connectors): A comma-separated list of all backend connectors that were queried during contact lookup (e.g.,
cupola, hodor, multipub, salesforce), regardless of whether they returned results. - Sources used (by field): A semicolon-separated list showing which backend system provided each specific data field. Format:
field_name:source_system(e.g.,person_name:cupola; org_name:hodor; cupola_org_id:cupola).
LLM Classification (if classification was performed):
- Initial Category: The category assigned by the first-pass classification LLM agent before QA review. Only shown if QA correction was applied.
- Final Category: The category after QA agent review. Annotated with
(QA corrected)if the QA agent changed the initial classification. - LLM Category: Shown when no QA correction was applied — the single category from classification.
- QA Explanation: The QA agent's reasoning for why it changed (or confirmed) the classification.
- Person Status: The employment/organizational status of the person as determined by the LLM (e.g.,
left_company,retired,deceased,active,on_leave). - Email Status: The status of the email address itself as determined by the LLM (e.g.,
valid,invalid,bounced,changed).
Determination:
- Determination: The final determination type in uppercase. Possible values:
INACTIVE(person has left/retired/deceased — mark inactive across systems),ACTIVE(person is still active — ensure records are current),REPLACEMENT(a replacement contact was identified — mark original inactive and add replacement),TITLE_UPDATE(person's title has changed),EMAIL_UPDATE(person's email address has changed),UNKNOWN(not relevant, spam, or unclassifiable — no action needed). - Confidence: The confidence score for the determination, displayed as a percentage (e.g.,
95%). - Source Email: The email address extracted from the auto-response body that was used as the basis for contact lookup (may differ from the sender email when the bounced email references a different address).
- New Email: A new email address for the person, extracted from the auto-response (relevant for
EMAIL_UPDATEand someREPLACEMENTscenarios). - Replacement: The name and email of the replacement contact in format
Name <email>. If multiple replacements were identified, each is numbered (Replacement 1, Replacement 2, etc.). - Repl. Title: The job title of the replacement contact, if provided.
- Personal Email: A retired/personal email address for the person (e.g., when someone leaves a company and provides their personal email).
- Long-term Leave: Displayed as
Yeswhen the person is identified as being on extended leave rather than having permanently departed. - Reasoning: The LLM's reasoning/notes explaining why this determination was made.
Multipub Subscription Validation (if validation was performed):
- Subscriber: The Multipub subscriber number and how it was matched (e.g., by email, by name). Shows
Not found in Multipubif no subscriber record was located. - Active Subs: Whether active subscriptions were found, with the count of active orders.
- Expired (12mo): Whether recently expired subscriptions (within 12 months) were found, with order count.
- Single-Issue: Whether recent single-issue purchases were found, with order count.
- Subscriptions:
No relevant subscription activity— shown when none of the above subscription types were found. - Review Flag: The reason the record was flagged for manual review (e.g., active subscriptions found for an inactive person).
- DEFERRED: Indicates that the inactive marking was HALTED because the person has active subscriptions in Multipub. These records require manual review before proceeding.
Standard Actions: A numbered list describing what actions WOULD be performed in a live run for this determination type, regardless of the current run mode. This serves as documentation of the expected workflow. Actions reference specific systems (Cupola, Hodor, Multipub, Salesforce) and note which are mocked in the current run.
Actions Executed: A list of every action that was actually executed (or mocked) during this run. Each action shows:
[ OK]or[FAIL]status indicator- The system name (e.g.,
cupola,hodor,salesforce) - The operation performed (e.g.,
mark_inactive,add_contact,update_email) - Detail text explaining the specific action taken
The section header varies by mode: (ALL MOCKED - dry-run), (writes MOCKED - read-only mode), or no annotation in live mode.
Outcome:
- Outcome: The final processing status. Possible values:
SUCCESS(all actions completed successfully),FAILED(one or more actions failed),SKIPPED NO CONTACT(contact not found in any system — no actions to take),SKIPPED UNKNOWN(determination was unknown — no actions needed),ERROR(an unexpected error occurred during processing),DEFERRED MULTIPUB(processing halted due to active Multipub subscriptions requiring manual review),PENDING(processing not yet complete — should not appear in final reports). - Reason: Explanation for why processing was skipped, if applicable.
- Error: The error message if the status is
ERRORorFAILED. - Duration: Processing time for this individual email in milliseconds.
Summary Section#
Aggregated counts across all emails in the batch:
- Total Emails: Total number of emails processed
- Successful: Count of emails with
successstatus - Failed: Count with
failedstatus - Skipped (no contact): Count with
skipped_no_contactstatus - Skipped (unknown det): Count with
skipped_unknownstatus - Errors: Count with
errorstatus - LLM Category Breakdown: Count of emails per LLM classification category
- Determination Breakdown: Count of emails per determination type
- QA Corrections: Number of emails where the QA agent changed the initial classification
- Multipub Validation: Counts for validated, active subscriptions found, recently expired, recent single-issue, and deferred (halted)
- Action Totals: Total actions executed, succeeded, and failed
- Output Document Lists: Record counts for Inactive People, Alternate Contacts, and Inactive at New Org lists
- Total Run Duration: Wall-clock time for the entire batch run in seconds
Output Document Lists (appended if data exists)#
Detailed listings for three output document types. See the individual output document file descriptions below for field details.
Output replay (regenerated/)#
The output replay utility (auto-responder-replay-output / scripts/replay_output.py) re-runs the shared batch pipeline (pipeline/batch_processor.py) for emails extracted from an existing processing_reports/run_* folder. Regenerated files are written only under run_*/regenerated/; originals in the run root are never overwritten.
After replay, regenerated/replay_verification.json summarizes per-file comparison (match, diff, error, or skipped) against the source artifact. Volatile LLM fields (confidence, QA explanation, timestamps) are ignored by default.
v1 scope: output_document_*, processing_report_*, category_summary_report, classifier output, cupola audit, impact report, and batch report (full-run). Notification CSVs (Hodor import, Tarun undetermined, Multipub follow-up) are deferred to v1.1.
4. processing_report.json#
Purpose#
A JSON companion to the human-readable processing report (.log). Contains the same data in a machine-parseable format suitable for programmatic consumption, integration with dashboards, or post-run analysis scripts.
Generation Conditions#
Generated whenever processing_report.log is generated (when at least one email is processed).
Format#
JSON object with UTF-8 encoding, 2-space indentation.
Top-Level Fields#
| Field | Type | Description |
|---|---|---|
| generated_at | string (ISO 8601) | The timestamp when this JSON file was generated. |
| run_start | string (ISO 8601) | The timestamp when the pipeline run began. |
| total_emails | integer | The total number of emails processed in this run. |
| records | array of objects | An array containing one object per processed email. Each object is a full serialization of the EmailProcessingRecord dataclass (see Record Fields below). |
| output_documents | object | Present only when the output document collector has data. Contains three keys: inactive_people, alternate_contacts, and inactive_new_org, each with purpose (string), record_count (integer), and records (array of objects). Undeliverables are not embedded here; when present they are written only to output_document_undeliverables.csv and output_document_undeliverables.json (see sections 16–17). |
Record Fields (each object in records array)#
Email Identification Fields#
| Field | Type | Description |
|---|---|---|
| sender_email | string | The email address of the auto-response sender. This is the raw SenderEmail from the email record in the database. |
| sender_name | string or null | The display name of the sender, if available from the email headers. May be null if the email only contained an address without a display name. |
| subject | string | The full subject line of the auto-response email. |
| received_date | string | The date and time the email was received, as recorded in the source database. Format may vary based on the source system. |
| inbox_source | string | The inbox/account from which the email was fetched. Corresponds to the AccountName in the email database (e.g., energy@thompson.com, grants@thompson.com, resources@associationexecs.com). This determines which business line the email belongs to. |
| message_id | string | The unique identifier for the email record, typically the database primary key Id from the Hodor dmorders_thompson SQL table. Used as the primary key for tracking this email throughout the pipeline. |
| original_sender_email | string or null | The original sender email before any normalization or cleanup. Present when the pipeline modifies the sender email during processing (e.g., stripping display names, handling forwarded emails). Null if no modification was needed. |
| body | string | The full body text of the auto-response email. Contains the raw text content that was analyzed by the LLM classifier to determine the person's status, extract replacement contacts, new email addresses, etc. |
Contact Lookup Fields#
| Field | Type | Description |
|---|---|---|
| lookup_email | string or null | The email address that was actually used for contact lookup across backend systems. This may differ from sender_email when the auto-response body references a different email address (the source_email). If null, no lookup was performed. |
| contact_found | boolean | true if the contact was found in at least one backend system (Cupola, Hodor, Multipub, or Salesforce). false if the email address was not found in any system. This is the primary indicator of whether downstream actions can be taken. |
| contact_systems | array of strings | List of backend systems where the contact was found. Possible values in the array: cupola (contact management system), hodor (Thompson's dmorders database), multipub (subscription/publication management), salesforce (CRM). An empty array means the contact was not found anywhere. |
| mock_contact_systems | array of strings | Subset of contact_systems that were operating in mock/simulated mode during this run. In dry-run mode, all systems are mocked. In read-only mode, write operations are mocked but reads are live. In live mode, this array is empty. Useful for distinguishing real vs. simulated lookup results. |
| cupola_org_id | string or null | The CUPOLA organization_id for the preferred org-person link. This is the organization identifier in the Cupola contact management system. Null if the contact was not found in Cupola or Cupola was not queried. |
| cupola_org_person_id | string or null | The CUPOLA org_person_id — the unique identifier for the link between a person and an organization in Cupola. This is the record that gets marked active/inactive when processing status changes. Null if not found in Cupola. |
| cupola_person_id | string or null | The CUPOLA person_id — the unique identifier for the person entity in Cupola, independent of their organizational affiliation. A person may have multiple org_person links but only one person_id. Null if not found in Cupola. |
| hodor_pros_num | string or null | The HODOR ProsNum (prospect number) — the unique contact identifier in Thompson's Hodor/dmorders database system. This is used to update contact status in Hodor (e.g., marking as "No Longer with Firm"). Null if not found in Hodor. |
| org_name | string or null | The organization/company name associated with the contact. May be sourced from Cupola, Hodor, or other backend systems (see org_name_source). Null if no organization name was found. |
| person_name | string or null | The full name of the contact person. May be sourced from Cupola, Hodor, or other backend systems (see person_name_source). Null if no person name was found. |
| lookup_sources_available | string | Comma-separated list of all backend connector names that were available and queried during the contact lookup phase, regardless of whether they returned results. Represents the scope of the search. Example: cupola, hodor, multipub, salesforce. |
| person_name_source | string or null | The specific backend system that provided the person_name value (e.g., cupola, hodor). Null if no person name was found. Useful for provenance tracking when multiple systems have conflicting data. |
| org_name_source | string or null | The specific backend system that provided the org_name value. Null if no org name was found. |
| sources_used_fields | string | A semicolon-separated summary of which backend system provided each specific data field. Format: field_name:source_system; field_name:source_system. Example: person_name:cupola; org_name:hodor; cupola_org_id:cupola; hodor_pros_num:hodor. This provides full provenance for every piece of contact data. |
Determination Fields#
| Field | Type | Description |
|---|---|---|
| determination | string | The final determination type assigned to this email after LLM classification, QA review, and contact lookup. Possible values: inactive (person has left the company, retired, or is deceased — mark contact inactive across all systems), active (person is confirmed active — ensure records are current), replacement (a replacement contact was identified — mark original inactive and add the replacement), title_update (person's job title has changed — update title across systems), email_update (person's email address has changed — update email across systems), unknown (email is not relevant, is spam, or cannot be classified — no action taken). Empty string if determination has not been made. |
| confidence | float | A confidence score between 0.0 and 1.0 representing how confident the system is in the determination. Higher values indicate greater certainty. A confidence of 0.0 typically indicates no determination was made. Displayed as a percentage in human-readable reports (e.g., 0.95 → 95%). |
| llm_category | string or null | The final LLM classification category after QA review. This is the category used to drive the determination logic. Possible values: undeliverable (bounce-back or invalid email address), left company (person departed the organization), retired (person retired), deceased (person is deceased), out of office (temporary absence — auto-reply), changed email (person's email address has changed). Null if classification was not performed. |
| initial_llm_category | string or null | The category assigned by the first-pass classification LLM agent, before QA review. When QA does not change the category, this matches llm_category. When QA corrects the classification, this preserves the original (incorrect) category for audit purposes. Null if classification was not performed. |
| person_status | string or null | The employment/organizational status of the person as extracted by the LLM from the email body. Examples: left_company, retired, deceased, active, on_leave. This is a more granular status than the llm_category and is used as input to the determination logic. Null if not extracted. |
| email_status | string or null | The status of the email address itself as determined by the LLM. Examples: valid, invalid, bounced, changed. Used to distinguish between "person is gone" vs. "email address is bad." Null if not extracted. |
| qa_correction_applied | boolean | true if the QA agent reviewed the initial classification and changed the category. false if the QA agent confirmed the original classification or if QA review was not performed. When true, initial_llm_category and llm_category will differ. |
| qa_explanation | string or null | The QA agent's textual explanation for why it changed or confirmed the initial classification. Provides transparency into the QA review decision. Null if QA review was not performed. |
| replacement_info | array of objects | List of replacement contacts identified from the auto-response email. Each object contains: replacement_name (string or null — the name of the replacement person), replacement_email (string or null — the email address of the replacement person), replacement_title (string or null — the job title of the replacement person). An empty array means no replacement was identified. Multiple entries indicate multiple replacements were mentioned. |
| sender_new_email | string or null | A new email address for the sender, extracted from the auto-response body. Relevant for email_update determinations where the person's email has changed. Also used in replacement scenarios where the departing person provides their new personal/forwarding email. Null if no new email was identified. |
| retired_personal_email | string or null | A personal/private email address provided by someone who has retired or left their organization. Distinct from sender_new_email in that this is typically a non-work email (e.g., Gmail, Yahoo) shared for personal contact purposes rather than as an official forwarding address. Null if none was provided. |
| is_long_term_leave | boolean | true if the LLM determined the person is on an extended/long-term leave of absence (e.g., maternity leave, sabbatical, medical leave) rather than having permanently departed the organization. This affects the determination — long-term leave contacts are flagged for review rather than immediately marked inactive. false for all other cases. |
| source_email | string or null | The email address extracted from the auto-response body that was used as the basis for contact lookup. This may differ from sender_email — for example, when a mail server's bounce message references the intended recipient's address, which is the address we actually need to look up. Null if no alternate source email was extracted. |
| notes | string or null | Free-form reasoning or notes from the LLM explaining the basis for its classification and any additional context it identified in the email body. Null if no reasoning was provided. |
| standard_actions_description | string or null | A human-readable, multi-line description of the standard actions that WOULD be performed for this determination type in a live run, based on the contact systems found and the determination type. This is generated from the determination reference documentation and serves as an expected-behavior checklist regardless of the current run mode. Null if no determination was made. |
Multipub Validation Fields#
| Field | Type | Description |
|---|---|---|
| multipub_validation_performed | boolean | true if Multipub subscription validation was executed for this email. Validation is performed for INACTIVE and REPLACEMENT determinations to check whether the person has active subscriptions before marking them inactive. false if validation was skipped (e.g., for ACTIVE or UNKNOWN determinations). |
| multipub_subsnum | string or null | The Multipub subscriber number (SubsNum) for this contact, if found. This is the unique identifier for a subscriber in the Multipub subscription management system. Null if the contact was not found in Multipub. |
| multipub_match_method | string or null | The method by which the contact was matched to a Multipub subscriber record. Possible values include matching by email address, by name, or by other criteria. Null if no match was found. |
| multipub_has_active_subscription | boolean | true if the contact has at least one currently active subscription in Multipub. When true for an INACTIVE determination, the inactive marking is HALTED (deferred) because the person still has live subscription activity that needs to be addressed by the sales team. |
| multipub_has_recently_expired | boolean | true if the contact has subscriptions that expired within the last 12 months. These are flagged for the sales team's awareness but do not halt inactive processing. |
| multipub_has_recent_single_issue | boolean | true if the contact has recent single-issue (one-time) purchases. These are flagged for the sales team's awareness but do not halt inactive processing. |
| multipub_active_order_count | integer | The number of currently active subscription orders. Zero if no active subscriptions exist. |
| multipub_expired_order_count | integer | The number of subscriptions that expired within the last 12 months. |
| multipub_single_issue_order_count | integer | The number of recent single-issue purchase orders. |
| multipub_flagged_for_review | boolean | true if this record was flagged for manual review due to subscription-related concerns (active subscriptions, recently expired, or single-issue orders found for an inactive person). |
| multipub_review_reason | string or null | The specific reason the record was flagged for review. Examples: Active subscriptions found for inactive contact, Recently expired subscriptions require sales follow-up. Null if not flagged. |
| multipub_deferred | boolean | true if the inactive marking was HALTED because active subscriptions were found in Multipub. This is the most critical flag — it means the system deliberately stopped processing this email to prevent marking someone inactive who still has live subscriptions. These records must be manually reviewed and resolved. |
Raw LLM Output Fields#
| Field | Type | Description |
|---|---|---|
| raw_classification_result | object or null | The complete, unmodified JSON output from the first-pass LLM classification agent. Contains the raw category, confidence, sender_new_email, alternate_contact, retired_personal_email, is_long_term_leave, reasoning, and any other fields the LLM produced. Null if classification was not performed. Preserved for audit and debugging purposes. |
| raw_qa_result | object or null | The complete, unmodified JSON output from the QA review LLM agent. Contains final_category, final_sender_new_email, final_alternate_contact, final_retired_personal_email, is_long_term_leave, qa_correction_applied, qa_explanation, and any other fields. Null if QA review was not performed. Preserved for audit and debugging. |
Actions Fields#
| Field | Type | Description |
|---|---|---|
| actions | array of objects | List of all actions executed (or mocked) for this email. Each action object contains: system (string — the backend system, e.g., cupola, hodor, salesforce, multipub), operation (string — the operation performed, e.g., mark_inactive, add_contact, update_email, check_subscriptions), success (boolean — whether the action completed successfully), detail (string — additional detail text about what was done, may be empty). An empty array means no actions were executed. |
Outcome Fields#
| Field | Type | Description |
|---|---|---|
| status | string | The final processing outcome status. Possible values: success (all actions completed), failed (one or more actions failed), skipped_no_contact (contact not found — no actions taken), skipped_unknown (determination was unknown — no actions needed), error (unexpected error occurred), deferred_multipub (halted due to active Multipub subscriptions), pending (should not appear in final output). |
| skip_reason | string or null | A human-readable explanation for why processing was skipped or deferred. Null when the email was fully processed. Examples: Contact not found in any system, Determination is unknown — no actions required, Deferred: active Multipub subscriptions. |
| error_message | string or null | The error message text when status is error or failed. Contains the exception message or a description of what went wrong. Null when no error occurred. |
| duration_ms | float | The wall-clock processing time for this individual email in milliseconds. Measures the time from when this email started processing to when it completed. Useful for identifying slow emails that may be caused by slow LLM responses, slow database lookups, or complex action execution. |
5. processing_report_master.csv and processing_report_ip4.csv#
Each run emits two CSV companions from ReportGenerator.write_report: processing_report_master.csv (full column ledger) and processing_report_ip4.csv (IP4-facing subset, fixed column order — 2026-05-03).
processing_report_master.csv#
Purpose#
Spreadsheet-compatible export with one row per processed email and the complete flattened column set (ReportGenerator._CSV_COLUMNS). Use this file for internal analysis, Client Services run reports, and audit.
Generation Conditions#
Generated whenever processing_report.log is generated.
Format#
CSV with UTF-8 BOM encoding (utf-8-sig for Excel compatibility). All fields are quoted (QUOTE_ALL). Newlines within field values are replaced with spaces to prevent row splitting.
Column Reference#
| # | Column Header | Source Field | Description |
|---|---|---|---|
| 1 | Email ID | message_id | Unique identifier for the email record (database primary key from Hodor). |
| 2 | Sender Email | sender_email | The email address of the auto-response sender. |
| 3 | Original Sender Email | original_sender_email | The sender email before normalization, if it was modified. Empty if unchanged. |
| 4 | Lookup Email | lookup_email | The email address actually used for contact lookup across backend systems. May differ from sender email. |
| 5 | Sender Name | sender_name | Display name of the sender from email headers. Empty if not available. |
| 6 | Subject | subject | Full subject line of the auto-response email. Newlines replaced with spaces. |
| 7 | Received Date | received_date | Date and time the email was received. |
| 8 | Inbox Source | inbox_source | The inbox/account (AccountName) the email was fetched from. Determines business line. |
| 9 | Body | body | Full body text of the email. Newlines replaced with spaces. |
| 10 | Contact Found | contact_found | Yes if contact was found in at least one backend system, No otherwise. |
| 11 | Sources | lookup_sources_available | Comma-separated list of all backend connectors queried during contact lookup. |
| 12 | Sources Used | sources_used_fields | Semicolon-separated provenance map showing which system provided each data field (e.g., person_name:cupola; org_name:hodor). |
| 13 | Contact Systems (Live) | computed | Comma-separated list of systems where the contact was found using LIVE (non-mocked) connections. Empty if all lookups were mocked or contact not found. |
| 14 | Contact Systems (Mock) | computed | Comma-separated list of systems where the contact was found using MOCKED connections. Empty in live mode. |
| 15 | HODOR ProsNum | hodor_pros_num | The Hodor prospect number for this contact. Empty if not found in Hodor. |
| 16 | CUPOLA Org ID | cupola_org_id | The Cupola organization ID. Empty if not found in Cupola. |
| 17 | CUPOLA Org Person ID | cupola_org_person_id | The Cupola org-person link ID. Empty if not found. |
| 18 | CUPOLA Person ID | cupola_person_id | The Cupola person entity ID. Empty if not found. |
| 19 | Multipub Subsnum | multipub_subsnum | The Multipub subscriber number. Empty if not found in Multipub. |
| 20 | Initial LLM Category | initial_llm_category | Category from the first-pass LLM classification, before QA review. Empty if not classified. |
| 21 | Final LLM Category | llm_category | Final category after QA review. Empty if not classified. |
| 22 | QA Correction Applied | qa_correction_applied | Yes if QA agent changed the initial classification, No otherwise. |
| 23 | QA Explanation | qa_explanation | QA agent's reasoning for its decision. Empty if QA was not performed. |
| 24 | Person Status | person_status | Person's employment/org status from LLM (e.g., left_company, retired). Empty if not extracted. |
| 25 | Email Status | email_status | Status of the email address from LLM (e.g., valid, bounced). Empty if not extracted. |
| 26 | Determination | determination | Final determination type: inactive, active, replacement, title_update, email_update, unknown. Empty if not determined. |
| 27 | Confidence | confidence | Confidence score formatted as percentage (e.g., 95%). 0% if not determined. |
| 28 | Source Email | source_email | Email address extracted from auto-response body used for lookup. Empty if same as sender. |
| 29 | New Email | sender_new_email | New email address identified for the person. Empty if none found. |
| 30 | Replacement Name | computed | Semicolon-separated list of replacement contact names (from replacement_info). Empty if no replacements. |
| 31 | Replacement Email | computed | Semicolon-separated list of replacement contact email addresses. Empty if no replacements. |
| 32 | Replacement Title | computed | Semicolon-separated list of replacement contact job titles. Empty if no replacements. |
| 33 | Retired Personal Email | retired_personal_email | Personal email provided by departed/retired person. Empty if none provided. |
| 34 | Long-term Leave | is_long_term_leave | Yes if person is on long-term leave, No otherwise. |
| 35 | Reasoning | notes | LLM reasoning/notes for the determination. Empty if none provided. |
| 36 | Multipub Validated | multipub_validation_performed | Yes if Multipub validation was performed, No otherwise. |
| 37 | Multipub Subscriber | multipub_subsnum | Multipub subscriber number (same as column 19). Empty if not found. |
| 38 | Multipub Match Method | multipub_match_method | How the contact was matched in Multipub (e.g., by email, by name). Empty if not matched. |
| 39 | Multipub Active Subs | multipub_has_active_subscription | Yes if active subscriptions exist, No otherwise. |
| 40 | Multipub Active Order Count | multipub_active_order_count | Number of active subscription orders. 0 if none. |
| 41 | Multipub Recently Expired | multipub_has_recently_expired | Yes if subscriptions expired within 12 months, No otherwise. |
| 42 | Multipub Expired Order Count | multipub_expired_order_count | Number of recently expired orders. 0 if none. |
| 43 | Multipub Single-Issue | multipub_has_recent_single_issue | Yes if recent single-issue purchases exist, No otherwise. |
| 44 | Multipub Single-Issue Order Count | multipub_single_issue_order_count | Number of single-issue orders. 0 if none. |
| 45 | Multipub Flagged for Review | multipub_flagged_for_review | Yes if record was flagged for manual review, No otherwise. |
| 46 | Multipub Review Reason | multipub_review_reason | Reason the record was flagged. Empty if not flagged. |
| 47 | Multipub Deferred | multipub_deferred | Yes if inactive marking was halted due to active subscriptions, No otherwise. |
| 48 | CUPOLA Actions Summary | computed | Semicolon-separated summary of all Cupola-specific actions. Format: [OK/FAIL] system: operation - detail. Empty if no Cupola actions. |
| 49 | Actions Summary | computed | Semicolon-separated summary of all non-Cupola actions (Hodor, Salesforce, etc.). Format: [OK/FAIL] system: operation - detail. Empty if no non-Cupola actions. |
| 50 | Status | status | Final processing status: success, failed, skipped_no_contact, skipped_unknown, error, deferred_multipub, pending. |
| 51 | Skip Reason | skip_reason | Reason for skipping. Empty if not skipped. |
| 52 | Error Message | error_message | Error text if status is error/failed. Empty if no error. |
| 53 | Duration (ms) | duration_ms | Processing duration in milliseconds, formatted as an integer. |
processing_report_ip4.csv#
Purpose#
Filtered export for Sai Teja / IP4: only rows that need manual Cupola follow-up under the agreed LLM categories, with a fixed 23-column layout so templates and macros do not drift (ReportGenerator._CSV_COLUMNS_IP4).
Generation Conditions#
Written together with the master CSV whenever processing_report.log is generated.
Row filter#
Only emails whose Final LLM Category (or, if empty, Initial LLM Category) normalizes to one of: Out of Office, Retired, Deceased, Left Company, Changed Email (ReportGenerator._IP4_ACTIONABLE_CATEGORIES). All other categories are excluded from this file (they still appear on the master CSV and in processing_report.json).
Format#
Same as the master CSV: UTF-8 BOM, QUOTE_ALL, newline sanitation.
Column Reference (fixed order — 23 columns)#
| # | Column Header | Source Field / derivation | Description |
|---|---|---|---|
| 1 | Email ID | message_id | Same as master §5 column 1. |
| 2 | Inbox Source | inbox_source | Same as master §5 column 8. |
| 3 | Original Sender Email | original_sender_email | Same as master §5 column 3. |
| 4 | Sender Email | sender_email | Same as master §5 column 2. |
| 5 | Lookup Email | lookup_email | Same as master §5 column 4. |
| 6 | Source Email | source_email | Same as master §5 column 28. |
| 7 | Sender Name | sender_name | Same as master §5 column 5. |
| 8 | Subject | subject | Same as master §5 column 6. |
| 9 | Body | body | Same as master §5 column 9. |
| 10 | Initial LLM Category | initial_llm_category | Same as master §5 column 20. |
| 11 | Final LLM Category | llm_category | Same as master §5 column 21. |
| 12 | Determination | determination | Same as master §5 column 26. |
| 13 | Person Status | person_status | Same as master §5 column 24. |
| 14 | Email Status | email_status | Same as master §5 column 25. |
| 15 | CUPOLA Org ID | cupola_org_id | Same as master §5 column 16. |
| 16 | CUPOLA Org Person ID | cupola_org_person_id | Same as master §5 column 17. |
| 17 | CUPOLA Person ID | cupola_person_id | Same as master §5 column 18. |
| 18 | New Email | sender_new_email | Same as master §5 column 29. |
| 19 | Replacement Name | computed from replacement_info | Same as master §5 column 30. |
| 20 | Replacement Email | computed from replacement_info | Same as master §5 column 31. |
| 21 | Replacement Title | computed from replacement_info | Same as master §5 column 32. |
| 22 | Retired Personal Email | retired_personal_email | Same as master §5 column 33. |
| 23 | CUPOLA Actions Summary | computed from actions (Cupola only) | Same as master §5 column 48. |
6. category_summary_report.csv#
Purpose#
A consolidated summary that groups all processed emails into five main business categories. This report collapses the granular LLM categories into broader groups for high-level analysis and reporting to stakeholders who need to understand the distribution of auto-response types without granular detail.
Generation Conditions#
Generated when at least one email is processed. Not generated if the records list is empty.
Format#
CSV with UTF-8 BOM encoding. All fields are quoted. Rows are ordered by category in a fixed sequence: Undeliverable, Left Company / Retired / Deceased, Out of Office, Changed Email, Other.
Category Mapping#
| Main Category | Mapped From LLM Categories |
|---|---|
| Undeliverable | undeliverable |
| Left Company / Retired / Deceased | left company, retired, deceased |
| Out of Office | out of office |
| Changed Email | changed email |
| Other | Any category not matching the above, or null/empty categories |
Column Reference#
| # | Column Header | Description |
|---|---|---|
| 1 | Category | The main business category this email was mapped to (one of the five categories above). |
| 2 | Email ID | Unique identifier for the email record (same as message_id). |
| 3 | Sender Email | The sender's email address. |
| 4 | Lookup Email | The email address used for contact lookup. Empty if same as sender or not available. |
| 5 | Contact Found | Yes if contact was found in any backend system, No otherwise. |
| 6 | Contact Systems | Comma-separated list of systems where the contact was found. |
| 7 | Determination | The final determination type (inactive, active, replacement, etc.). Empty if not determined. |
| 8 | Status | Processing outcome status (success, failed, skipped_no_contact, etc.). |
| 9 | Org Name | Organization name associated with the contact. Empty if not found. |
| 10 | Person Name | Person's name. Falls back to sender name if person name is not available. |
| 11 | CUPOLA Org ID | Cupola organization ID. Empty if not in Cupola. |
| 12 | CUPOLA Org Person ID | Cupola org-person link ID. Empty if not in Cupola. |
| 13 | CUPOLA Person ID | Cupola person ID. Empty if not in Cupola. |
| 14 | HODOR ProsNum | Hodor prospect number. Empty if not in Hodor. |
| 15 | Multipub SubsNum | Multipub subscriber number. Empty if not in Multipub. |
7. category_summary_report.json#
Purpose#
JSON companion to the category summary CSV. Provides the same grouped data in a machine-readable format with records organized under their respective category keys.
Generation Conditions#
Generated alongside category_summary_report.csv.
Format#
JSON object with UTF-8 encoding, 2-space indentation.
Top-Level Fields#
| Field | Type | Description |
|---|---|---|
| generated_at | string (ISO 8601) | Timestamp when this file was generated. |
| run_start | string (ISO 8601) | Timestamp when the pipeline run began. |
| total_emails | integer | Total number of emails in this report. |
| categories | object | An object where each key is a main category name and the value is an object containing record_count (integer) and records (array of objects). Each record object has the same fields as the CSV columns listed above, using snake_case keys: category, email_id, sender_email, lookup_email, contact_found, contact_systems, determination, status, org_name, person_name, cupola_org_id, cupola_org_person_id, cupola_person_id, hodor_pros_num, multipub_subsnum. |
8. classifier_output/classifier_output.json#
Purpose#
The raw, unprocessed output from the LLM classification and QA agents for every email that went through classification. This file preserves the full agent responses before any post-processing, mapping, or interpretation by the pipeline. It serves as the primary audit trail for LLM decision-making and is essential for debugging classification issues, evaluating LLM accuracy, and tuning prompts.
Generation Conditions#
Generated only when at least one email went through LLM classification (i.e., at least one record has raw_classification_result or raw_qa_result populated). Created in a classifier_output/ subdirectory within the run folder.
Format#
JSON object with UTF-8 encoding, 2-space indentation.
Top-Level Fields#
| Field | Type | Description |
|---|---|---|
| generated_at | string (ISO 8601) | Timestamp when this file was generated. |
| run_start | string (ISO 8601) | Timestamp when the pipeline run began. |
| total_classified_emails | integer | Number of emails that were classified by the LLM in this run. |
| records | array of objects | One object per classified email (see below). |
Record Fields#
| Field | Type | Description |
|---|---|---|
| email_id | string | Unique identifier for the email. |
| sender_email | string | Sender's email address. |
| sender_name | string or null | Sender's display name. |
| subject | string | Email subject line. |
| inbox_source | string | Inbox/account the email came from. |
| classification_agent_output | object or null | The complete raw JSON response from the first-pass classification LLM agent. Structure depends on the LLM prompt and may include: category, confidence, sender_new_email, alternate_contact, retired_personal_email, is_long_term_leave, reasoning, person_status, email_status, and any additional fields the LLM returns. Null if classification was not performed. |
| qa_agent_output | object or null | The complete raw JSON response from the QA review LLM agent. Structure depends on the QA prompt and may include: final_category, final_sender_new_email, final_alternate_contact, final_retired_personal_email, is_long_term_leave, qa_correction_applied, qa_explanation, and any additional fields. Null if QA review was not performed. |
9. classifier_output/classifier_output.csv#
Purpose#
A tabular/spreadsheet-friendly view of the LLM classification and QA outputs. Flattens the raw agent responses into discrete columns for side-by-side comparison of initial classification vs. QA review results, and includes the Determination the pipeline derived from the QA-final category (see column 6 below).
Note on categories: The classification and QA agents are expected to assign a single label from the nine LLM categories. If the model returns a compound string (for example comma-separated labels), the pipeline normalizes it to one canonical category using a fixed severity priority before mapping to actions, and the CSV reflects that normalized value in Initial Category and Final Category.
Generation Conditions#
Generated alongside classifier_output.json.
Format#
CSV with UTF-8 BOM encoding. All fields quoted.
Column Reference#
| # | Column Header | Description |
|---|---|---|
| 1 | Email ID | Unique identifier for the email. |
| 2 | Sender Email | Sender's email address. |
| 3 | Sender Name | Sender's display name. Empty if not available. |
| 4 | Subject | Email subject line (newlines replaced with spaces). |
| 5 | Inbox Source | Inbox/account the email was fetched from. |
| 6 | Determination | The pipeline's mapped action type for this email: one of unknown, inactive, active, replacement, email_update, title_update (same meaning as elsewhere in processing reports). Derived from the QA-final LLM category and business rules in category_mapper.map_category_to_determination, not a separate LLM field. Empty if not populated on the processing record. |
| 7 | Initial Category | Category assigned by the first-pass classification agent (from raw_classification_result.category), after normalization to a single canonical label when needed. |
| 8 | Confidence | Confidence level from the classification agent (from raw_classification_result.confidence). |
| 9 | Sender New Email (Classification) | New email address extracted by the classification agent. Empty if none found. |
| 10 | Alternate Contact (Classification) | Alternate/replacement contact info extracted by classification agent. May be a structured string. Empty if none found. |
| 11 | Retired Personal Email (Classification) | Personal email extracted by classification agent. Empty if none found. |
| 12 | Is Long Term Leave (Classification) | Yes if classification agent identified long-term leave, No otherwise. |
| 13 | Reasoning | The classification agent's reasoning text explaining its categorization. |
| 14 | Final Category | Category after QA review (from raw_qa_result.final_category), after normalization to a single canonical label when needed. |
| 15 | Final Sender New Email | New email after QA review correction. Empty if not changed or not applicable. |
| 16 | Final Alternate Contact | Alternate contact after QA correction. Empty if not changed. |
| 17 | Final Retired Personal Email | Personal email after QA correction. Empty if not changed. |
| 18 | Is Long Term Leave (QA) | Yes if QA agent confirmed long-term leave, No otherwise. |
| 19 | QA Correction Applied | Yes if QA changed the classification, No if it confirmed the original. |
| 20 | QA Explanation | QA agent's explanation of its review decision. |
10. output_document_inactive_people.csv#
Purpose#
A business deliverable listing all people determined to be INACTIVE (left company, retired, deceased) along with the specific actions taken or planned across each backend system. This document is used by operations teams to verify that inactive contacts have been properly removed or suppressed across CUPOLA, HODOR, SFMC, and Multipub. It also provides the sales team with active subscription information so they can follow up on transferring subscriptions. Not emailed on N04. The marketing team receives the slimmer SFMC import file(s) described in Marketing suppression deliverable (*_NoLongerThere_*.csv) via notify_marketing_suppression.
Generation Conditions#
Generated only when at least one inactive person record exists in the output document collector.
Format#
CSV with UTF-8 BOM encoding. All fields quoted.
Column Reference#
| # | Column Header | Description |
|---|---|---|
| 1 | Id | Record identifier. Typically the email's database primary key or a generated unique ID for tracking this inactive person record through downstream workflows. |
| 2 | AccountName | The inbox/account source (business line email) the auto-response was received at. Examples: energy@thompson.com, grants@thompson.com, resources@associationexecs.com. Determines which business line is affected and which SFMC suppression list to use. |
| 3 | Org Name | The organization/company name the person was associated with. Sourced from Cupola or Hodor contact records. Empty if not available. |
| 4 | Person Name | The full name of the inactive person. Sourced from Cupola or Hodor contact records. Empty if not available. |
| 5 | The email address of the inactive person (the "Auto Response Received From" address). This is the email that triggered the auto-response and is the address being marked inactive across systems. | |
| 6 | Status with Org | The person's status relative to their organization as determined by the LLM (e.g., left_company, retired, deceased). Provides context for why the person is being marked inactive. Empty if not determined. |
| 7 | CUPOLA Org ID | The Cupola organization ID for the preferred org-person link. Used by operations to verify the correct organization record in Cupola. Empty if not in Cupola. |
| 8 | CUPOLA Person ID | The Cupola person entity ID. Used by operations to locate the person record in Cupola. Empty if not in Cupola. |
| 9 | CUPOLA Org Person IDs | Comma-separated list of all Cupola org_person_id values that were marked inactive for this email address. A person may have multiple org-person links (e.g., they are a contact at multiple organizations). All linked records are marked inactive. Empty if not in Cupola. |
| 10 | HODOR ProsNums | Comma-separated list of all Hodor prospect numbers (ProsNum) that were marked as "No Longer with Firm" for this email address. A person may have multiple prospect records in Hodor. Empty if not in Hodor. |
| 11 | Multipub Subsnum | The Multipub subscriber number, if the contact was found in the Multipub subscription system. Empty if not found. |
| 12 | Salesforce IDs | Comma-separated list of Salesforce Lead or Contact record IDs associated with this email, if the contact was found in Salesforce. Empty if not in Salesforce. |
| 13 | HODOR Status | The HODOR status action that was taken. Typically No Longer with Firm for inactive contacts. Empty if no Hodor action was taken. |
| 14 | SFMC Suppression Added | Yes if the email address was added to the SFMC (Salesforce Marketing Cloud) Auto Suppression List for the corresponding business line. No if the suppression was not added (e.g., if SFMC operations were mocked or failed). |
| 15 | Multipub Active Subscriptions | A summary of active subscriptions found in Multipub for this person. Contains serialized order details (up to 3 entries) for the sales team to follow up on. These are subscriptions that need to be transferred or cancelled since the person is no longer active. Empty if no active subscriptions. |
| 16 | Multipub Recent Orders | A summary of recently expired or single-issue orders from Multipub (within the past 12 months). Contains serialized order details (up to 3 entries) for sales team awareness. Empty if no recent orders. |
11. output_document_inactive_people.json#
Purpose#
JSON companion to the inactive people CSV. Contains the same data in structured format for programmatic consumption.
Generation Conditions#
Generated alongside the CSV when inactive person records exist.
Format#
JSON object with UTF-8 encoding, 2-space indentation.
Top-Level Fields#
| Field | Type | Description |
|---|---|---|
| list_name | string | Always "List of Inactive People". |
| purpose | string | Always "Remove / follow up these emails from across our systems (CUPOLA, HODOR, SFMC, MultiPub)". |
| generated_at | string (ISO 8601) | Timestamp when this file was generated. |
| record_count | integer | Number of inactive person records in this file. |
| records | array of objects | Each object is a full serialization of the InactivePersonRecord Pydantic model. All fields from the CSV are present using snake_case naming. Nested lists and objects (such as multipub_active_subscriptions and multipub_recent_orders) are fully expanded as arrays of objects rather than serialized strings. |
Marketing suppression deliverable ({BusinessUnit}_NoLongerThere_{YYYY-MM-DD}.csv)#
Purpose#
SFMC-ready suppression import file(s) for the marketing team (notification catalog N04) — this is the production suppression path. Derived from the same InactivePersonRecord rows as output_document_inactive_people.csv, but only the email list and three import columns — no Cupola/Hodor/Multipub detail. Live SFMC REST upsert during processing is not in production; see MARKETING_SUPPRESSION.html.
Generation Conditions#
Written in the same pass as output_document_inactive_people.csv when at least one inactive person record exists. Implemented in marketing_suppression_deliverable.write_marketing_suppression_deliverables. One file per business-unit label (sorted by label); emails are deduped case-insensitively within each file.
Format#
CSV with UTF-8 BOM encoding. All fields quoted. Filename pattern: {BusinessUnit}_NoLongerThere_{YYYY-MM-DD}.csv where YYYY-MM-DD is parsed from run_{date}_{time} on the run folder, or today if the pattern does not match. Today resolve_business_unit_label maps every inbox to Marketing.
Column Reference#
| # | Column Header | Description |
|---|---|---|
| 1 | Email Address | Inactive person email to suppress in SFMC. |
| 2 | Status | Always Unsubscribed (fixed import value). |
| 3 | Date Added | ISO date (YYYY-MM-DD) matching the run-date token in the filename. |
Notification#
Notifier.notify_marketing_suppression discovers files with glob *_NoLongerThere_*.csv and attaches them only. output_document_inactive_people.csv is not attached. See NOTIFICATIONS_CATALOG — N04 and the detailed guide MARKETING_SUPPRESSION.html.
12. output_document_alternate_contacts.csv#
Purpose#
A business deliverable consolidating all replacement/alternate contacts identified from auto-response emails. When an inactive person's auto-response mentions a replacement (e.g., "Please contact Jane Doe at jane@company.com instead"), the replacement's information is captured here with planned actions for adding or updating them across CUPOLA, HODOR, and Multipub.
Generation Conditions#
Generated only when at least one alternate contact record exists in the output document collector.
Format#
CSV with UTF-8 encoding, minimal quoting.
Column Reference#
| # | Column Header | Description |
|---|---|---|
| 1 | Id | Record identifier for tracking this alternate contact through downstream workflows. |
| 2 | AccountName | The inbox/account source (business line email) the original auto-response was received at. Determines which HODOR library the alternate contact will be imported into. |
| 3 | Email Received From | The email address of the original (now inactive) person whose auto-response mentioned this alternate contact. This links the alternate contact back to the inactive person they are replacing. |
| 4 | Subject | Source auto-response subject (traceability). |
| 5 | Email Body | Full raw body of the source auto-response email (plain text or HTML as stored). |
| 6 | Message ID | Source message identifier for traceability. |
| 7 | Org ID | The Cupola organization ID (organization_id) for the organization the alternate contact is being added to. This is typically the same organization as the original inactive person. Empty if not in Cupola. |
| 8 | Org Name | The organization/company name for the original sender, resolved once via Contact.resolve_organization_for_deliverable (CUPOLA preferred row, then Hodor firm, then first org hint). Always matches Firm inside HODOR Import Data. Comments may include Org source: (cupola_preferred, hodor_firm, hint, replacement_cupola). Empty if not available. |
| 9 | Alternate Person Name | The full name of the replacement/alternate contact person, as extracted by the LLM from the auto-response body. For HODOR import, this is split into Fname (first name) and Lname (last name). Empty if not provided. |
| 10 | Alternate Person Title | The job title of the alternate contact (e.g., "Director of Marketing", "VP Sales"). Maps to the HODOR Titl field. Empty if not provided. |
| 11 | Alternate Person Email | The email address of the alternate contact. Maps to the HODOR Email field. This is the primary identifier used to check if the person already exists in CUPOLA. Empty if not provided. |
| 12 | Alternate Person Phone | The phone number of the alternate contact. Maps to the HODOR Phone field. Empty if not provided. |
| 13 | Alternate Person Ext | The phone extension of the alternate contact. Maps to the HODOR pext field. Empty if not provided. |
| 14 | Org Person ID | The Cupola org_person_id for the alternate contact, if they already exist in Cupola. Used when the action is update rather than add. Empty if the contact is new to Cupola. |
| 15 | Person ID | The Cupola person_id for the alternate contact, if they already exist in Cupola. Empty if the contact is new. |
| 16 | HODOR ProsNum | The Hodor prospect number for the alternate contact, if they already exist in Hodor. Empty if new to Hodor. |
| 17 | Comments | Free-form comments or context about this alternate contact, typically derived from the auto-response text. May include the original person's name, the nature of the handoff, or other contextual information. Empty if none. |
| 18 | CUPOLA Action | The planned action for this alternate contact in Cupola. Values: add (pipeline will call add_contact because check_contact_exists found no row for this email), update (at least one Cupola row exists for this email — typically same mailbox/org-person handling). Empty if no Cupola action is planned. Note: The underlying add_contact implementation still enforces email/org rules: if a row already exists for the target org it returns that org_person_id; if the email exists only under other orgs it reuses person_id and inserts a new org-person link. See docs/connections/cupola.html. |
| 19 | HODOR Library | The HODOR library code that this alternate contact will be imported into. Determined by the AccountName (inbox source). Mapping: energy@thompson.com → ENGY, grants@thompson.com → GRDM, resources@associationexecs.com → ASSN, resources@associationtrends.com → ASSN, resources@thealmanacofamericanpolitics.com → GR. Empty if library cannot be determined. |
| 20 | HODOR Import Data | JSON-serialized object containing the fields needed for the HODOR import template: Fname (first name), Lname (last name), Titl (title), Firm (organization name), Email (email address), Phone (phone number), pext (phone extension). Empty if no HODOR import is planned. |
| 21 | Multipub Sales Request | Yes if this alternate contact was provided to the sales team for Multipub follow-up (typically when the original inactive person had active subscriptions that need to be transferred). No otherwise. |
13. output_document_alternate_contacts.json#
Purpose#
JSON companion to the alternate contacts CSV. Contains the same data in structured format.
Generation Conditions#
Generated alongside the CSV when alternate contact records exist.
Format#
JSON object with UTF-8 encoding, 2-space indentation.
Top-Level Fields#
| Field | Type | Description |
|---|---|---|
| list_name | string | Always "List of Alternate Contacts". |
| purpose | string | Always "Consolidate list of all provided Alternate Contacts and add / update across all systems". |
| generated_at | string (ISO 8601) | Timestamp when this file was generated. |
| record_count | integer | Number of alternate contact records. |
| records | array of objects | Full serialization of AlternateContactRecord Pydantic models. All fields match the CSV columns using snake_case naming. The hodor_import_data field is a proper JSON object (not a serialized string). |
14. output_document_inactive_new_org.csv#
Purpose#
A business deliverable tracking inactive people who have moved to a new organization. When an auto-response indicates someone has left for a different company (e.g., "I have moved to XYZ Corp"), this document captures the new organization details and records the planned actions for potentially adding or updating them in CUPOLA and HODOR at their new organization.
Generation Conditions#
Generated only when at least one inactive-at-new-org record exists in the output document collector.
Format#
CSV with UTF-8 encoding, minimal quoting.
Column Reference#
| # | Column Header | Description |
|---|---|---|
| 1 | Id | Record identifier for tracking this record through downstream workflows. |
| 2 | Account Name | The inbox/account source (business line email) the auto-response was received at. |
| 3 | Email Received From | The email address of the person who sent the auto-response (the person who moved to a new organization). |
| 4 | Person Name | The name of the person who has moved to a new organization. Empty if not available. |
| 5 | New Org ID | The identifier for the new organization (e.g., a Cupola organization ID if the new org already exists in Cupola, or a newly assigned ID). Empty if the new organization has not been identified in any system. |
| 6 | New Org Name | The name of the new organization the person has moved to, as extracted from the auto-response body by the LLM. Empty if not provided. |
| 7 | New Org Title | The person's job title at their new organization. Empty if not provided. |
| 8 | New Org Email | The person's email address at their new organization (e.g., person@newcompany.com). Empty if not provided. |
| 9 | New Org Phone | The person's phone number at their new organization. Empty if not provided. |
| 10 | Org Person ID | The Cupola org_person_id for this person, if they already exist in Cupola. Used for updating existing records. Empty if not in Cupola. |
| 11 | Person ID | The Cupola person_id for this person, if they already exist in Cupola. Empty if not in Cupola. |
| 12 | HODOR ProsNum | The Hodor prospect number for this person, if they exist in Hodor. Empty if not in Hodor. |
| 13 | Comments | Free-form comments or context about the person's move, derived from the auto-response text. May include original organization name, reason for move, or other details. Empty if none. |
| 14 | CUPOLA Action | The planned Cupola action for this record. Values: add (person/org will be added to Cupola), update (existing record will be updated with new org info), skip (record will not be modified in Cupola), ignore (organization is not AI-appropriate and will not be added). Empty if no Cupola action planned. |
| 15 | CUPOLA Org Exists | Yes if the new organization already exists in Cupola. No if the organization is not yet in Cupola. This determines whether the person can be directly added to the existing org or if the org needs to be created first. |
| 16 | CUPOLA AI Appropriate | Yes if the new organization has been determined to be "AI appropriate" — meaning it is in an industry or category that warrants inclusion in Thompson's contact management systems. No if the organization is outside the target market and should be ignored. This check is performed when the organization does not already exist in Cupola. |
| 17 | HODOR Library Assignment | The HODOR library that this person should be assigned to at their new organization. Since the person has changed organizations, they may no longer be in the same industry as before, so library assignment may differ from the original. Currently marked as TBD in many cases pending manual review. Empty if not determined. |
| 18 | Multipub Sales Request | Yes if the person's new contact information was provided to the sales team for Multipub follow-up (e.g., to transfer subscriptions to their new organization). No otherwise. |
15. output_document_inactive_new_org.json#
Purpose#
JSON companion to the inactive-at-new-org CSV. Contains the same data in structured format.
Generation Conditions#
Generated alongside the CSV when inactive-at-new-org records exist.
Format#
JSON object with UTF-8 encoding, 2-space indentation.
Top-Level Fields#
| Field | Type | Description |
|---|---|---|
| list_name | string | Always "List of Inactive People at New Organization". |
| purpose | string | Always "Track where inactive people went and determine if they should be included in our systems". |
| generated_at | string (ISO 8601) | Timestamp when this file was generated. |
| record_count | integer | Number of records. |
| records | array of objects | Full serialization of InactiveNewOrgRecord Pydantic models. All fields match the CSV columns using snake_case naming. |
16. output_document_undeliverables.csv#
Purpose#
A business deliverable listing all emails classified as undeliverable — bounce-backs, invalid email addresses, and mail delivery failures. These records represent email addresses that are no longer valid and need to be removed or suppressed across backend systems to maintain data hygiene.
Generation Conditions#
Generated only when at least one undeliverable record exists in the output document collector.
Format#
CSV with UTF-8 BOM encoding. All fields quoted.
Column Reference#
| # | Column Header | Description |
|---|---|---|
| 1 | Id | Record identifier for tracking this undeliverable record. |
| 2 | AccountName | The inbox/account source (business line email) the bounce-back was received at. |
| 3 | Sender Email | The sender email address from the bounce notification. This is typically the mail server or postmaster address, not the intended recipient. |
| 4 | Lookup Email | The email address that was looked up in backend systems. This is the address that actually bounced — the intended recipient whose email is no longer valid. |
| 5 | Org Name | The organization name associated with the undeliverable email, if the contact was found in any backend system. Empty if not found. |
| 6 | Person Name | The name of the person associated with the undeliverable email, if found. Empty if not found. |
| 7 | Subject | The subject line of the bounce-back email. Often contains the original subject or a delivery failure message. |
| 8 | CUPOLA Org ID | Cupola organization ID for the undeliverable contact, if found. Empty if not in Cupola. |
| 9 | CUPOLA Person ID | Cupola person ID for the undeliverable contact, if found. Empty if not in Cupola. |
| 10 | CUPOLA Org Person IDs | Comma-separated list of Cupola org-person IDs associated with this undeliverable email. Empty if not in Cupola. |
| 11 | HODOR ProsNums | Comma-separated list of Hodor prospect numbers for this contact. Empty if not in Hodor. |
| 12 | Multipub Subsnum | Multipub subscriber number, if found. Empty if not in Multipub. |
| 13 | Multipub Sales Request | Yes when catalog N02 sales follow-up was queued because Multipub validation showed active, recently expired, or recent single-issue activity. No otherwise. Backend writes remain blocked for bounce-pending undeliverables. |
| 14 | Multipub Active Subscriptions | Serialized active Multipub orders when validation ran (same shape as inactive-people deliverable). Empty if none. |
| 15 | Multipub Recent Orders | Recently expired or single-issue orders within the validation window. Empty if none. |
| 16 | Status | The processing status for this undeliverable record (e.g. bounce_pending_rule, skipped_no_contact). |
| 17 | Skip Reason | Reason if the undeliverable was not fully processed. Empty if processed normally. |
17. output_document_undeliverables.json#
Purpose#
JSON companion to the undeliverables CSV. Contains the same data in structured format.
Generation Conditions#
Generated alongside the CSV when undeliverable records exist.
Format#
JSON object with UTF-8 encoding, 2-space indentation.
Top-Level Fields#
| Field | Type | Description |
|---|---|---|
| list_name | string | Always "List of Undeliverables". |
| purpose | string | Always "Bounce-backs and invalid email addresses for follow-up and removal from systems". |
| generated_at | string (ISO 8601) | Timestamp when this file was generated. |
| record_count | integer | Number of undeliverable records. |
| records | array of objects | Full serialization of UndeliverableRecord Pydantic models with snake_case field names. |
18. output_document_inactive_no_cupola_match.csv#
Purpose#
Handoff list for every CUPOLA-undetermined case — inactive (or inactive-stage) determinations with no Cupola match, ACTIVE determinations with no Cupola row (no-auto-add policy), and ACTIVE determinations whose matched Cupola row is inactive (reactivation candidates). Used by IP4 / operations for manual Cupola research or record creation. Delivered via notify_sai_action_items (catalog N05/N06) — To: NOTIFICATION_EMAIL_SAI; global Max + Vish Cc.
Generation Conditions#
Generated when at least one InactiveNoCupolaMatchRecord was collected during the run (OutputDocumentCollector.inactive_no_cupola_match).
Format#
CSV with UTF-8 encoding. Column headers follow the same human-readable style as other output_document_* CSVs.
Column Reference#
Headers match output_document_generator.py (generate_inactive_no_cupola_match_csv).
| # | Column Header | Description |
|---|---|---|
| 1 | Id | Record identifier |
| 2 | AccountName | Inbox / account source |
| 3 | Email Received From | Sender of the auto-response |
| 4 | Subject | Email subject |
| 5 | Person Name | Person name if inferred or from lookup |
| 6 | Org Name | Organization name if available |
| 7 | Determination | Pipeline determination label |
| 8 | Status with Org | Person/org status string when set (person_status on the model) |
| 9 | Multipub Deferred | Yes / No — inactive path deferred by active Multipub subscription gate |
| 10 | Multipub Review Reason | Multipub validation text when present |
| 11 | HODOR ProsNums | Comma-separated Hodor prospect numbers if found without Cupola |
| 12 | Multipub Subsnum | Subscriber number if found |
| 13 | Salesforce IDs | Comma-separated Salesforce Lead/Contact identifiers if found |
| 14 | Message ID | Original message id for traceability |
19. output_document_inactive_no_cupola_match.json#
Purpose#
JSON companion to the IP4 no-Cupola handoff CSV.
Generation Conditions#
Generated alongside the CSV when inactive-no-Cupola-match records exist.
Format#
JSON object with UTF-8 encoding, 2-space indentation.
Top-Level Fields#
| Field | Type | Description |
|---|---|---|
| list_name | string | "List of Inactive People with No Cupola Match" |
| purpose | string | "IP4 handoff list for CUPOLA-undetermined cases — inactive with no Cupola match, active with no Cupola row, and reactivation candidates (inactive Cupola row on an ACTIVE determination)" |
| generated_at | string (ISO 8601) | When the file was written |
| record_count | integer | Number of records |
| records | array of objects | InactiveNoCupolaMatchRecord fields in snake_case |
20. cupola_audit_log.csv#
Purpose#
A dedicated audit trail for all changes made (or planned) in the CUPOLA contact management system during a run. This file documents every status change (marking contacts active/inactive) and every new contact addition, providing a complete record for compliance, rollback, and operational review purposes.
Generation Conditions#
Always generated every run: CupolaAuditLogger.write_audit_log() writes header-only CSV and an empty entries array in JSON when no Cupola actions were logged.
Format#
CSV with UTF-8 BOM encoding. All fields quoted.
Column Reference#
| # | Column Header | Description |
|---|---|---|
| 1 | Timestamp | The exact date and time (ISO 8601, Eastern Time) when this CUPOLA action was recorded. |
| 2 | Action Type | The type of CUPOLA operation. Values: status_change (an existing contact's active/inactive status was changed), contact_addition (a new contact was added to CUPOLA). |
| 3 | Contact ID | The CUPOLA org_person_id for the affected contact. For status changes, this is the existing contact ID. For contact additions, this is the newly assigned ID (if available) or empty if the addition was mocked. |
| 4 | The email address of the contact being modified or added. | |
| 5 | Name | The name of the contact. Empty if not available. |
| 6 | Org Name | The organization name associated with the contact. Empty if not available. |
| 7 | Requested Status | For status_change entries: Yes if the contact was being set to ACTIVE, No if being set to INACTIVE. Empty for contact_addition entries. |
| 8 | Previous Status | The link_org_person.status value captured immediately before the UPDATE via SQL OUTPUT deleted.status. 1 = active, 0 = inactive. Empty for contact additions, recommendation-only rows, or read-only mode where the value cannot be observed. |
| 9 | Auto Applied | Yes when the change was actually executed against CUPOLA via cupola.update_contact_status_with_audit (i.e. CUPOLA_AUTOMATIC_UPDATES=true). No when the audit row records a recommendation only (sent to Venu). |
| 10 | Update Succeeded | Yes / No when Auto Applied=Yes to record whether the SQL UPDATE returned success. Empty for recommendation-only rows. |
| 11 | Reason | The reason for the status change (e.g., Person left company per auto-response, Inactive determination from LLM classification). Empty for contact additions. |
| 12 | Determination | The pipeline determination that triggered this action (e.g., inactive, active, replacement). |
| 13 | Email Source | The source email address from the auto-response that initiated this action. This is the original auto-response sender, linking the audit entry back to the triggering email. Empty for contact additions. |
| 14 | Title | The job title of the contact. Only populated for contact_addition entries where a title was available. Empty for status changes. |
21. cupola_audit_log.json#
Purpose#
JSON companion to the CUPOLA audit CSV. Contains the same data in structured format for programmatic consumption.
Generation Conditions#
Generated alongside the CSV on every run (empty entries when nothing was logged).
Format#
JSON object with UTF-8 encoding, 2-space indentation.
Top-Level Fields#
| Field | Type | Description |
|---|---|---|
| generated_at | string (ISO 8601) | Timestamp when this file was generated. |
| entry_count | integer | Total number of audit log entries. |
| entries | array of objects | Each object represents one CUPOLA action. Fields match the CSV columns using snake_case keys: timestamp, action_type, contact_id, email, name, org_name, requested_status (boolean for status changes), previous_status (integer 0/1 or null), auto_applied (boolean), update_succeeded (boolean or null), reason, determination, email_source, title. Note: in JSON, boolean fields are true/false/null rather than the Yes/No/empty string used in CSV. |
22. cupola_audit_log_rollback_plan.csv#
Purpose#
A revertible record of every CUPOLA status_change that was actually executed (Auto Applied=Yes) during the run. Generated by CupolaAuditLogger._write_rollback_plan (src/auto_responder/utils/cupola_audit_logger.py) so an operator can roll the batch back with simple SQL if a problem is discovered after the fact.
Generation Conditions#
Generated when at least one audit entry has auto_applied=True and update_succeeded is not False. Not written when:
- the run only emitted recommendations (
CUPOLA_AUTOMATIC_UPDATES=false), or - every auto-applied UPDATE failed, or
- there were no Cupola actions at all.
Format#
CSV with UTF-8 BOM encoding. All fields quoted (csv.DictWriter with QUOTE_ALL).
Column Reference#
| # | Column Header | Description |
|---|---|---|
| 1 | Timestamp | When the original update was logged (ISO 8601, Eastern Time). |
| 2 | Contact ID | CUPOLA org_person_id that was updated. |
| 3 | Contact email at the time of update. | |
| 4 | Name | Contact name when known. |
| 5 | Org Name | Organization name when known. |
| 6 | Applied Status | The integer status that was written by the run. 1 if the contact was set ACTIVE, 0 if INACTIVE. |
| 7 | Previous Status | The integer status captured immediately before the UPDATE, sourced from OUTPUT deleted.status. The literal string MISSING appears when the previous value could not be observed (read-only wrapper, mock connector, etc.). |
| 8 | Rollback SQL | A single ready-to-run statement that inverts the change, e.g. UPDATE link_org_person SET status = 1 WHERE org_person_id = '<id>';. When Previous Status is MISSING, this column contains a -- MANUAL: previous status unknown comment instead. |
| 9 | Reason | Same reason text recorded in cupola_audit_log.csv. |
| 10 | Determination | Pipeline determination that drove the action (e.g. inactive, active). |
Operational notes#
- The plan is written next to
cupola_audit_log.csv/.jsonin the same run directory. - Rows that hit the
MISSINGmarker should be triaged before running their SQL — the run wrote them without observing the prior state, which usually means a mock or read-only wrapper was active. - The plan is regenerated per run; older plans are not garbage-collected.
23. output_document_multipub_audit.csv#
Purpose#
Per-row Multipub validation audit for every INACTIVE determination that was checked against the Multipub subscription gate. Written to the run directory for engineers; not emailed (Tarun receives notify_tarun_undetermined_sender_review only). After review, Tarun may post files back through POST /multipub/upload (Yes → notify_multipub_subscriber_followup_from_upload to Angel/Yogesh).
Generation Conditions#
Generated when at least one MultipubAuditRecord was collected during the run (OutputDocumentCollector.multipub_audit). Both deferred and non-deferred inactive paths produce a row when Multipub validation runs.
Format#
CSV with UTF-8 BOM encoding. Booleans rendered as Yes / No (via _sanitize_for_csv). All fields quoted.
Column Reference#
Headers come from OutputDocumentGenerator.generate_multipub_audit_csv (src/auto_responder/utils/output_document_generator.py).
| # | Column Header | Description |
|---|---|---|
| 1 | Id | Record identifier (8-char UUID slice). |
| 2 | AccountName | Inbox / account source the auto-response landed in. |
| 3 | Email Received From | Email address used for the Multipub lookup (post relay normalization). |
| 4 | Person Name | Person name resolved by the contact lookup; empty if not known. |
| 5 | Org Name | Organization name resolved by the contact lookup; empty if not known. |
| 6 | Determination | Determination label (e.g. inactive). |
| 7 | Multipub Subsnum | Matched Multipub subscriber number; empty when no Multipub record was found. |
| 8 | Has Active Subscription | Yes when MultipubValidationResult.has_active_subscription is true. |
| 9 | Active Order Count | Number of currently-active subscription orders returned by Multipub. |
| 10 | Has Recently Expired | Yes when at least one recently-expired subscription was found. |
| 11 | Recently Expired Order Count | Number of recently-expired orders returned. |
| 12 | Has Recent Single-Issue | Yes when at least one recent single-issue purchase was found. |
| 13 | Recent Single-Issue Order Count | Number of recent single-issue orders returned. |
| 14 | Flagged for Review | Yes when the validation gate flagged the row (typically equals Has Active Subscription OR a review-worthy non-active subscription). |
| 15 | Inactive Action Deferred | Yes when the inactive workflow was held back because of an active Multipub subscription. No for clean inactive rows that proceeded. |
| 16 | Review Reason | Free-text reason from MultipubValidationResult.review_reason. Empty when not flagged. |
| 17 | Summary | Single-line summary string from MultipubValidationResult.get_summary(). |
| 18 | Message ID | Original message ID for traceability. |
24. output_document_multipub_audit.json#
Purpose#
JSON companion to the Multipub audit CSV — same data, structured for programmatic consumption.
Generation Conditions#
Generated alongside the CSV whenever Multipub audit records exist.
Format#
JSON object with UTF-8 encoding, 2-space indentation.
Top-Level Fields#
| Field | Type | Description |
|---|---|---|
| list_name | string | Always "Multipub Audit (Tarun handoff)". |
| purpose | string | Describes the deliverable as a per-row Multipub validation audit for INACTIVE determinations. |
| generated_at | string (ISO 8601) | Timestamp when the file was written. |
| record_count | integer | Number of audit rows. |
| records | array of objects | Full serialization of MultipubAuditRecord Pydantic models with snake_case keys (booleans, not Yes/No). |
25. output_document_email_update_requests.csv#
Purpose#
Per-row deliverable for the Changed Email category. Written to the run directory; not bundled into marketing emails (N04 attaches only *_NoLongerThere_*.csv suppression imports from inactive people). Replaces the historical "12Feb-10Mar Email Update Requests" manual export.
Generation Conditions#
Generated by ReportGenerator.write_email_update_requests_deliverable when at least one processed email maps to the Changed Email main category. When zero rows qualify, the file is skipped and a single INFO log line is emitted.
Format#
CSV with UTF-8 BOM encoding. All fields quoted.
Column Reference#
| # | Column Header | Description |
|---|---|---|
| 1 | Email ID | Source message_id of the auto-response. |
| 2 | Sender Email | Original sender address (post relay normalization). |
| 3 | Lookup Email | Address actually used for backend lookup (signature / NDR target / sender, in that priority order). |
| 4 | Contact Found | Yes / No — whether any backend system returned a contact. |
| 5 | Contact Systems | Comma-separated list of systems that matched (e.g. Cupola, Hodor). |
| 6 | Determination | Pipeline determination label (typically email_update). |
| 7 | Status | Per-email processing status (success, skipped_*, etc.). |
| 8 | Org Name | Organization resolved by lookup; empty if not found. |
| 9 | Person Name | Resolved person name (falls back to sender name when needed). |
| 10 | Sender New Email | The new email address extracted from the auto-response body, if surfaced by the classifier. |
| 11 | CUPOLA Org ID | Cupola Org ID when matched. |
| 12 | CUPOLA Org Person ID | Cupola org-person link ID when matched. |
| 13 | CUPOLA Person ID | Cupola person ID when matched. |
| 14 | HODOR ProsNum | Hodor pros-num when matched. |
| 15 | Multipub SubsNum | Multipub subscriber number when matched. |
26. output_document_email_update_requests.json#
Purpose#
JSON companion to the email-update-requests CSV.
Generation Conditions#
Generated alongside the CSV when Changed Email rows exist.
Format#
JSON object with UTF-8 encoding, 2-space indentation.
Top-Level Fields#
| Field | Type | Description |
|---|---|---|
| list_name | string | Always "Email update requests (Changed Email)". |
| purpose | string | Always "Address corrections for SFMC / marketing systems". |
| generated_at | string (ISO 8601) | Timestamp when the file was written. |
| record_count | integer | Number of records. |
| records | array of objects | Mirror of the CSV columns using snake_case keys (email_id, sender_email, …). |
27. action_log.log#
Purpose#
A verbose execution log that tracks every individual operation (database lookups, updates, notifications, LLM calls) in detail, primarily used during dry-run and read-only modes. This file shows exactly what the system would do (or did do) for each email, including mock operations that simulate real actions. It serves as the definitive record of operational intent and is particularly valuable for validating pipeline behavior before switching to live mode.
Generation Conditions#
Generated when the pipeline runs in dry-run mode or read-only mode. Not generated in full live mode. Created at the start of the run.
Format#
Plain text with timestamped entries.
Structure#
Header#
================================================================================
DRY-RUN EXECUTION LOG
Started: {ISO 8601 timestamp}
================================================================================Entry Types#
Each entry is timestamped with [HH:MM:SS] in Eastern Time.
Email Processing Start:
[HH:MM:SS] EMAIL PROCESSING: {email_id} from {sender_email}
[HH:MM:SS] Subject: {subject (truncated to 100 chars)}Contact Lookup:
[HH:MM:SS] CONTACT LOOKUP: {email}
[HH:MM:SS] [MOCK] {System}: Found contact {contact_id}
[HH:MM:SS] [MOCK] {System}: Not foundLLM Classification:
[HH:MM:SS] LLM Classification: {category} (confidence: {confidence})
[HH:MM:SS] Extracted new email: {new_email}
[HH:MM:SS] Extracted alternate contact: {contact_info}
[HH:MM:SS] Extracted personal email: {personal_email}Determination:
[HH:MM:SS] Determination: {determination} (confidence: {score})Database Updates (mocked):
[HH:MM:SS] [MOCK] Would {operation} in {System} for {contact_id} ({key=value, ...})Notifications (mocked):
[HH:MM:SS] [MOCK] Would send notification: {type}
[HH:MM:SS] To: {recipient}
[HH:MM:SS] Subject: {subject}Action Execution:
[HH:MM:SS] ACTION EXECUTION: Determination={determination} for {email}Email Completion:
[HH:MM:SS] Email processing {SUCCESS|FAILED} for {email}Summary Section (appended at end of run)#
================================================================================
SUMMARY
================================================================================
Total Emails Processed: {count}
Determinations:
- {type}: {count}
Database Operations (would be performed):
- {System}: {count} {operation_type}, {count} {operation_type}
Notifications (would be sent):
- {type}: {count}
LLM Classification Calls: {count}
Execution Duration: {seconds} seconds
Completed: {ISO 8601 timestamp}
================================================================================Summary Fields#
| Field | Description |
|---|---|
| Total Emails Processed | Number of emails that went through the full processing pipeline. |
| Determinations | Breakdown of determination types and their counts (e.g., inactive: 5, active: 2, unknown: 3). |
| Database Operations | Per-system breakdown of all database operations that would be performed (in live mode) or were mocked. Grouped by system (Cupola, Hodor, Salesforce, Multipub) with operation counts (e.g., lookups, update_status, add_contact). |
| Notifications | Count of each notification type that would be sent (e.g., alerts to Max/Client Services about active subscriptions). |
| LLM Classification Calls | Total number of LLM API calls made during classification. |
| Execution Duration | Total wall-clock time for the entire run in seconds. |
28. batch_report.html#
Purpose#
A self-contained, visually rich HTML dashboard summarizing the entire batch run. Designed for browser viewing and sharing with stakeholders. Features interactive Plotly charts, KPI cards, per-email detail tables, and links to the output document files. This is the most polished and accessible output artifact, suitable for non-technical audiences.
Generation Conditions#
Generated when at least one email is processed.
Format#
Single HTML file with embedded CSS. Uses the Plotly JavaScript library via CDN (https://cdn.plot.ly/plotly-2.27.0.min.js) for interactive charts and Google Fonts (Outfit, IBM Plex Mono, IBM Plex Sans) for typography. Dark theme (slate/charcoal background with sky-blue and teal accents).
Sections#
Run Overview (KPI Cards)#
| Metric | Description |
|---|---|
| Mode | The run mode: DRY-RUN (all connections mocked), READ-ONLY (live reads, writes mocked), or LIVE. |
| Total Emails | Number of emails processed in this batch. |
| Duration | Total run time in seconds. |
| Action Success Rate | Percentage of successfully completed actions out of total actions attempted. |
| Successful | Count of emails that completed with success status. |
| Failed | Count of emails with failed status. |
| Skipped (no contact) | Count of emails where contact was not found in any system. |
| Skipped (unknown) | Count of emails with unknown determination (no action needed). |
| Errors | Count of emails that encountered unexpected errors. |
| Deferred (Multipub) | Count of emails where inactive marking was halted due to active Multipub subscriptions. |
| QA Corrections | Number of times the QA agent changed the initial LLM classification. |
| Multipub Validated | Number of emails that underwent Multipub subscription validation. |
| Multipub Deferred | Number of emails deferred due to active Multipub subscriptions (same as Deferred above). |
Output Document Counts#
| Metric | Description |
|---|---|
| Inactive People | Number of records in the inactive people output document. |
| Alternate Contacts | Number of records in the alternate contacts output document. |
| Inactive at New Org | Number of records in the inactive-at-new-org output document. |
Visual Analysis (Interactive Charts)#
| Chart | Type | Description |
|---|---|---|
| Determination Breakdown | Donut/pie chart | Distribution of determination types (INACTIVE, ACTIVE, REPLACEMENT, UNKNOWN, etc.) across all processed emails. |
| Outcome Status Distribution | Horizontal bar chart | Count of each processing status (Success, Failed, Skipped No Contact, Skipped Unknown, Error, Deferred Multipub). |
| LLM Category Breakdown | Vertical bar chart | Count of emails per LLM classification category (undeliverable, left company, retired, deceased, out of office, changed email, N/A). |
| Actions by System | Stacked bar chart | Count of succeeded vs. failed actions per backend system (Cupola, Hodor, Salesforce, Multipub). |
In-Depth Analysis (Per-Email Table)#
| Column | Description |
|---|---|
| # | Sequential row number. |
| Sender | Sender's email address (monospaced). |
| Subject | Email subject, truncated to 60 characters with ... if longer. |
| Determination | Determination type in uppercase. |
| Confidence | Confidence score as percentage. |
| Status | Processing status in title case (spaces replace underscores). |
| Actions | Summary of up to 5 actions in format [OK/FAIL] system: operation. Shows +N more if additional actions exist. Shows — if no actions. |
| Error | Error message text, or — if no error. |
Output Documents (Links)#
Provides download links (relative file paths) to the three output document pairs:
- Inactive People (CSV · JSON)
- Alternate Contacts (CSV · JSON)
- Inactive at New Org (CSV · JSON)
Note: Undeliverables are generated as separate files but are not linked from the HTML report.
29. batch_report.pptx#
Purpose#
A PowerPoint presentation summarizing the batch run for executive review or team meetings. Contains approximately 10 slides covering KPIs, determination breakdowns, outcome status, LLM category analysis, actions by system, confidence and quality metrics, per-email summary tables, and output document counts.
Generation Conditions#
Generated alongside batch_report.html when at least one email is processed.
Format#
PowerPoint .pptx file generated using the python-pptx library.
Slides#
| Slide | Content |
|---|---|
| Title Slide | Report title with generation date and run window. |
| Executive Summary KPIs | Total emails, duration, action success rate, key outcome counts. |
| Determination Breakdown | Chart and counts of each determination type. |
| Outcome Status | Distribution of processing statuses. |
| LLM Category Analysis | Breakdown of LLM classification categories. |
| Actions by System | Success/failure counts per backend system. |
| Confidence & Quality | QA correction rate, average confidence, Multipub validation stats. |
| Per-Email Summary | Table(s) listing each email with sender, subject, determination, status. |
| Output Documents | Counts and summaries for the three output document lists (inactive people, alternate contacts, inactive at new org). |
30. output_document_human_review.csv / .json#
Purpose#
Consolidated Human Review digest introduced by the active-only automation policy. Captures every row that the pipeline refused to act on automatically so IP4 / operations can triage manually. Written by OutputDocumentCollector.add_human_review. Actionable rows ride in notify_sai_action_items; metadata (counts + reason legend) is included in notify_venu_cupola_audit_files.
Generation Conditions#
Generated whenever OutputDocumentCollector.human_review is non-empty. Rows are added by ActionEngine from several handlers:
reason constant | When |
|---|---|
HUMAN_REVIEW_REASON_ACTIVE_NEW_CONTACT | ACTIVE outcome but no CUPOLA row — no auto-add. |
HUMAN_REVIEW_REASON_REACTIVATION_CANDIDATE | ACTIVE outcome but matched CUPOLA row is inactive — no auto-reactivate. |
HUMAN_REVIEW_REASON_UPDATE_ON_INACTIVE | EMAIL_UPDATE / TITLE_UPDATE on inactive CUPOLA row (active-only gate blocked it). |
HUMAN_REVIEW_REASON_OUT_OF_OFFICE | OUT_OF_OFFICE determination — tracked separately, no system writes. |
| Existing reasons (UNKNOWN, bounce triage, replacement parse fallback, etc.) | Already collected from previous phases. |
Format#
CSV with UTF-8 BOM encoding; JSON with 2-space indentation. All fields quoted.
Column Reference (CSV)#
Headers come from output_document_generator.py (generate_human_review_csv). Column titles use spaced words (e.g. Sender Email, Lookup Email).
| Column | Description |
|---|---|
| ID | Record identifier (8-char UUID slice). |
| Account Name | Inbox / account source. |
| Message ID | Original message id for traceability. |
| Sender Email | Sender of the auto-response. |
| Lookup Email | Email used after normalization for contact lookup. |
| Subject | Email subject. |
| Email Body | Full raw body of the source email (plain text or HTML as stored). |
| Reason | One of the HUMAN_REVIEW_REASON_* constants listed above. |
| Reason Detail | Human-readable explanation of why the pipeline deferred. |
| Determination | Determination label at the time of routing. |
| LLM Category | Normalized classifier category when available. |
| Confidence | LLM confidence when available. |
| Person Name / Org Name | When available. |
| CUPOLA OrgPerson IDs / HODOR ProsNums / Multipub Subsnum / Salesforce IDs | Resolved identifiers when known. |
| Suggested Action | Recommended next step for the reviewer. |
| Notes | Free-form pipeline notes. |
JSON Top-Level Fields#
| Field | Type | Description |
|---|---|---|
list_name | string | "Human Review digest" |
purpose | string | Describes the file as the consolidated human-review queue. |
generated_at | string (ISO 8601) | Timestamp. |
record_count | integer | Number of rows. |
records | array | Full serialization of each review record. |
31. impact_report.txt / .json#
Purpose#
Per-run headline summary introduced by the active-only automation policy. Produced by utils/impact_report.py and attached inline to Notifier.notify_run_audit_for_ip4 (Sai-only run audit).
Generation Conditions#
Always written at end of run (after the CUPOLA audit logger finishes flushing). The counts are derived from the in-memory CupolaAuditLogger.entries list, so read-only mode and dry-run runs still emit the report (counts are zero when no writes occurred).
Format#
impact_report.txt— plain text with three labelled counts, one per line.impact_report.json— structured object with the same counts plus a timestamp.
Fields#
| Field | Type | Description |
|---|---|---|
emails_processed | integer | Total auto-response emails handled in the run. |
records_deactivated | integer | CUPOLA rows flipped to inactive — status_change audit entries with requested_status=False and auto_applied=True. |
records_added | integer | New CUPOLA rows inserted — contact_addition audit entries with a non-empty contact_id. Only ticks for REPLACEMENT when CUPOLA_AUTO_ADD_REPLACEMENTS=true. |
generated_at | string (ISO 8601) | Timestamp the report was written (JSON only). |
32. action_items_tracker.csv (cross-run)#
Purpose#
Central queue of one row per action notification email sent in a run (N01–N07). Appended once per run by append_action_items_for_run after all notifications complete. Each row includes ActionItemCount and a per-attachment breakdown in Summary (artifact line counts, not separate tracker rows). Default path: {REPORT_OUTPUT_DIR}/action_items_tracker.csv; override with ACTION_ITEMS_TRACKER_PATH. Post-run completion requests use catalog N12 via auto-responder-request-action-item-confirmation (not appended to this CSV); N12 bodies use collect_action_item_detail_rows for the same counts.
Generation Conditions#
Skipped when the run folder RunId already exists in the tracker (idempotent re-run/resend). New rows are appended with Completed=false for manual spreadsheet triage.
Columns#
| Column | Description |
|---|---|
| Completed | First column for spreadsheet triage. false on append; operators set true when the notification owner confirms work. |
| NotificationTo | Configured SMTP To recipient(s) for that catalog notification (comma-separated when multiple, e.g. N02 Angel + Yogesh). |
| RunId | Run folder name (e.g. run_2026-05-26_14-30-00). |
| RunTimestamp | Parsed from folder name when possible. |
| NotificationId | N01–N07 (N05 vs N06 follows Sai bundle logic). |
| ActionItemCount | Number of actionable lines in attached CSVs for that notification. |
| SourceFiles | Semicolon-separated list of run artifacts that contributed rows. |
| Summary | Per-file counts (e.g. output_document_alternate_contacts.csv: 67; …). |
| CompletedAt / Notes | Empty on append; manual follow-up. |
Glossary of Systems#
| System | Full Name | Description |
|---|---|---|
| CUPOLA | CUPOLA Contact Management | Thompson's primary contact and organization management system. Stores person records, organization records, and org-person links. The system of record for contact active/inactive status. |
| HODOR | Hodor / dmorders_thompson | Thompson's prospect/subscriber database. Contains prospect numbers (ProsNum), email records, and subscription metadata. Contacts can be marked "No Longer with Firm" when inactive. |
| SFMC | Salesforce Marketing Cloud | Email marketing platform. The Auto Suppression List prevents marketing emails from being sent to inactive/invalid addresses. |
| Multipub | MultiPub Subscription Management | Publication subscription and order management system. Tracks active subscriptions, expired orders, and single-issue purchases. Used to validate whether an inactive person still has live subscription activity before marking them inactive. |
| Salesforce | Salesforce CRM | Customer relationship management system. Contains Lead and Contact records. Updated when contact status changes (if not related to Multipub). |
Glossary of Determination Types#
| Determination | Description |
|---|---|
| inactive | Person has permanently left the organization (left company, retired, or deceased). All contact records across systems should be marked inactive/suppressed. |
| active | Person is confirmed active at their organization. When no CUPOLA row exists or the matched row is inactive, the pipeline refuses to auto-add / auto-reactivate and routes to Human Review; mirror systems (Hodor, non-Multipub Salesforce) are still updated as before. |
| replacement | A replacement/alternate contact was identified. The original person is marked inactive and the replacement row is captured for IP4 review (auto-add disabled unless CUPOLA_AUTO_ADD_REPLACEMENTS=true). |
| title_update | Person's job title has changed. Gated on active CUPOLA row — when the gate fails the entire update is blocked and routed to Human Review. |
| email_update | Person's email address has changed. Gated on active CUPOLA row — when the gate fails the entire update is blocked and routed to Human Review. |
| out_of_office | Auto-reply is a temporary absence notification. Promoted to a first-class determination by the performs no system writes and emits a Human Review row with HUMAN_REVIEW_REASON_OUT_OF_OFFICE. |
| unknown | Email is not relevant (spam, unrelated content) or cannot be classified. No action is taken. |
Glossary of Processing Statuses#
| Status | Description |
|---|---|
| success | All planned actions completed successfully. |
| failed | One or more actions failed during execution. |
| skipped_no_contact | Contact was not found in any backend system — no actions could be taken. |
| skipped_unknown | Determination was unknown — no actions were needed. |
| error | An unexpected error occurred during processing (e.g., network failure, unhandled exception). |
| deferred_multipub | Inactive marking was halted because the person has active subscriptions in Multipub. Requires manual review. |
| pending | Processing has not yet completed. Should not appear in final reports. |
Maintaining this document#
Edit docs/DATA_DICTIONARY.html directly. Preview locally from the repo root:
python scripts/serve_data_dictionary.pyThen open http://127.0.0.1:8765/DATA_DICTIONARY.html in a browser.