DocsAutoResponderProcess

AutoResponderProcess — Output Files Data Dictionary#

See repo source for current behavior (ReportGenerator._CSV_COLUMNS / _CSV_COLUMNS_IP4, output_document.py).

This document provides a comprehensive reference for every output file produced by the AutoResponderProcess pipeline. Each file is described with its purpose, generation conditions, format, and a detailed explanation of every column, field, or structural element it contains.

All output files are written to a timestamped run directory:

processing_reports/run_{YYYY-MM-DD_HH-MM-SS}/

Table of Contents#

  1. run.log
  2. stage_execution.log
  3. processing_report.log
  4. processing_report.json
  5. processing_report_master.csv / processing_report_ip4.csv
  6. category_summary_report.csv
  7. category_summary_report.json
  8. classifier_output/classifier_output.json
  9. classifier_output/classifier_output.csv
  10. output_document_inactive_people.csv
  11. output_document_inactive_people.json
  12. Marketing suppression deliverable ({BusinessUnit}_NoLongerThere_{date}.csv)
  13. output_document_alternate_contacts.csv
  14. output_document_alternate_contacts.json
  15. output_document_inactive_new_org.csv
  16. output_document_inactive_new_org.json
  17. output_document_undeliverables.csv
  18. output_document_undeliverables.json
  19. output_document_inactive_no_cupola_match.csv
  20. output_document_inactive_no_cupola_match.json
  21. cupola_audit_log.csv
  22. cupola_audit_log.json
  23. cupola_audit_log_rollback_plan.csv
  24. output_document_multipub_audit.csv
  25. output_document_multipub_audit.json
  26. output_document_email_update_requests.csv
  27. output_document_email_update_requests.json
  28. action_log.log
  29. batch_report.html
  30. batch_report.pptx
  31. output_document_human_review.csv / .json
  32. impact_report.txt / .json

1. run.log#

Purpose#

The primary runtime log file for the entire pipeline execution. Captures every log message emitted by any Python logger during the run at the DEBUG level and above. This is the most granular diagnostic artifact and is the first place to look when troubleshooting unexpected behavior, errors, or performance issues.

Generation Conditions#

Always generated. Created at the start of every run via setup_logging() in logging_config.py.

Format#

Plain text. Each line follows the enhanced logging format:

{timestamp} - [{correlation_id}] - {logger_name} - {level} - {message}

Field Descriptions#

FieldDescription
timestampThe date and time the log entry was recorded, formatted as YYYY-MM-DD HH:MM:SS in US Eastern Time (America/New_York). All timestamps throughout the application are normalized to Eastern Time for consistency.
correlation_idAn 8-character UUID prefix that uniquely identifies a logical unit of work (typically one email being processed). This allows you to trace all log messages related to a single email across multiple modules and subsystems. Displays N/A when no correlation context is active (e.g., during initialization).
logger_nameThe fully qualified Python module name that emitted the log entry (e.g., auto_responder.connectors.cupola_connector). This tells you exactly which component of the system generated the message.
levelThe severity level of the log entry. In the file handler, all levels from DEBUG upward are captured. Possible values: DEBUG (detailed diagnostic information), INFO (general operational messages), WARNING (unexpected but recoverable situations, including slow-operation alerts for functions exceeding 1000ms), ERROR (failures that prevented an operation from completing), CRITICAL (severe failures that may halt the entire run).
messageThe free-form log message content. May include structured data such as email addresses, contact IDs, system names, operation results, timing information, and error tracebacks. For operations decorated with @log_performance, a [duration=X.XXms] suffix is appended when the function completes.

Notes#


2. stage_execution.log#

Purpose#

A structured, stage-by-stage execution log that tracks the pipeline's progression through its major processing phases. Unlike run.log which captures every message, this file is organized into discrete stage sections with JSON-encoded data blocks, making it ideal for programmatic post-run analysis and pipeline health monitoring.

Generation Conditions#

Always generated. Created at run start by the StageLogger class. The final summary section is written when the pipeline completes (normal or early exit).

Format#

Plain text with embedded JSON blocks. The file is divided into:

  1. A header section
  2. One section per pipeline stage
  3. A final summary section

Structure#

====================================================================================================
AUTORESPONDER PROCESS - STAGE EXECUTION LOG
Run Started: {ISO 8601 timestamp}
====================================================================================================

Per-Stage Section#

Each stage that executes during the pipeline gets its own section:

====================================================================================================
STAGE: {stage_name}
Timestamp: {ISO 8601 timestamp}
----------------------------------------------------------------------------------------------------
STAGE_DATA (JSON):
{JSON object with stage metadata, timing, and data}

SUMMARY:
  Duration: {X.XX}ms ({X.XX}s)
  Status: {completed|failed|skipped}
  Emails Processed: {count}      (if applicable)
  Error: {error message}          (if applicable)
  Details: {JSON details}         (if applicable)
====================================================================================================

STAGE_DATA JSON Fields#

FieldTypeDescription
stage_namestringThe internal name of the pipeline stage (e.g., STEP_1_EXTRACT_EMAILS, STEP_2_CONTACT_LOOKUP, STEP_3_CLASSIFY, STEP_4_DETERMINE, STEP_5_EXECUTE_ACTIONS, STEP_6_GENERATE_REPORTS). Identifies which phase of the processing pipeline this section documents.
start_timestring (ISO 8601)The exact timestamp when this stage began execution, in Eastern Time. Used together with end_time to compute the stage's wall-clock duration.
end_timestring (ISO 8601)The exact timestamp when this stage completed execution.
duration_msfloatThe elapsed wall-clock time for the stage in milliseconds. Computed as the difference between end_time and start_time. Useful for identifying performance bottlenecks — for example, a slow LLM classification stage or a slow database lookup.
statusstringThe outcome of the stage. completed means the stage finished without fatal errors. failed means the stage encountered an unrecoverable error. skipped means the stage was intentionally bypassed (e.g., no emails to process).
metadataobjectAdditional key-value pairs provided when the stage was started. Content varies by stage and may include configuration parameters, input counts, or other contextual information.
dataobjectArbitrary structured data logged during stage execution via log_stage_data(). Each key represents a named data point; the value can be any JSON-serializable structure. Examples include email counts, lookup results, classification summaries, or action execution details.
errorsarrayList of error objects recorded during the stage. Each error object contains: error (the error message string), error_type (the Python exception class name, e.g., ConnectionError, ValueError), context (additional key-value context about the error), and timestamp (when the error occurred).
warningsarrayList of warning objects recorded during the stage. Each warning object contains: warning (the warning message string), context (additional key-value context), and timestamp (when the warning was recorded). Warnings indicate non-fatal issues that may merit attention but did not prevent stage completion.

Final Summary Section#

FINAL SUMMARY

Contains a SUMMARY_DATA (JSON) block and a HUMAN-READABLE SUMMARY.

FieldTypeDescription
run_start_timestring (ISO 8601)The timestamp when the entire pipeline run began.
run_end_timestring (ISO 8601)The timestamp when the pipeline run completed.
total_duration_msfloatTotal wall-clock time for the entire run in milliseconds.
total_duration_secondsfloatTotal wall-clock time in seconds (convenience field).
stages_completedintegerThe number of stages that were executed during the run.
statisticsobjectAggregated statistics across all stages. Contains: total_emails_extracted (number of emails pulled from the inbox), unique_emails (number of deduplicated emails), emails_processed (number of emails that went through the full pipeline), determinations (a dictionary mapping each determination type to its count), errors (array of all errors across all stages), warnings (array of all warnings across all stages).
stage_summariesarrayA compact array summarizing each stage. Each entry contains stage_name, status, and duration_ms. This provides a quick-glance view of which stages ran and how long each took.

3. processing_report.log#

Purpose#

The primary human-readable processing report. Provides a comprehensive, formatted text summary of every email that was processed, including the contact lookup results, LLM classification, determination, Multipub validation, standard actions, executed actions, and final outcome. This is the main report for operational review of a batch run.

Generation Conditions#

Generated when at least one email is processed. Not generated if the pipeline finds zero emails in Step 1 (early exit).

Format#

Plain text, structured with fixed-width formatting and separator lines. The report has four major sections: Header, Per-Email Details, Summary, and Output Document Lists.

Sections#

Per-Email Block (repeated for each email)#

Each email gets a detailed block with these subsections:

Email Identification:

Contact Lookup:

LLM Classification (if classification was performed):

Determination:

Multipub Subscription Validation (if validation was performed):

Standard Actions: A numbered list describing what actions WOULD be performed in a live run for this determination type, regardless of the current run mode. This serves as documentation of the expected workflow. Actions reference specific systems (Cupola, Hodor, Multipub, Salesforce) and note which are mocked in the current run.

Actions Executed: A list of every action that was actually executed (or mocked) during this run. Each action shows:

The section header varies by mode: (ALL MOCKED - dry-run), (writes MOCKED - read-only mode), or no annotation in live mode.

Outcome:

Summary Section#

Aggregated counts across all emails in the batch:

Output Document Lists (appended if data exists)#

Detailed listings for three output document types. See the individual output document file descriptions below for field details.


Output replay (regenerated/)#

The output replay utility (auto-responder-replay-output / scripts/replay_output.py) re-runs the shared batch pipeline (pipeline/batch_processor.py) for emails extracted from an existing processing_reports/run_* folder. Regenerated files are written only under run_*/regenerated/; originals in the run root are never overwritten.

After replay, regenerated/replay_verification.json summarizes per-file comparison (match, diff, error, or skipped) against the source artifact. Volatile LLM fields (confidence, QA explanation, timestamps) are ignored by default.

v1 scope: output_document_*, processing_report_*, category_summary_report, classifier output, cupola audit, impact report, and batch report (full-run). Notification CSVs (Hodor import, Tarun undetermined, Multipub follow-up) are deferred to v1.1.


4. processing_report.json#

Purpose#

A JSON companion to the human-readable processing report (.log). Contains the same data in a machine-parseable format suitable for programmatic consumption, integration with dashboards, or post-run analysis scripts.

Generation Conditions#

Generated whenever processing_report.log is generated (when at least one email is processed).

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

FieldTypeDescription
generated_atstring (ISO 8601)The timestamp when this JSON file was generated.
run_startstring (ISO 8601)The timestamp when the pipeline run began.
total_emailsintegerThe total number of emails processed in this run.
recordsarray of objectsAn array containing one object per processed email. Each object is a full serialization of the EmailProcessingRecord dataclass (see Record Fields below).
output_documentsobjectPresent only when the output document collector has data. Contains three keys: inactive_people, alternate_contacts, and inactive_new_org, each with purpose (string), record_count (integer), and records (array of objects). Undeliverables are not embedded here; when present they are written only to output_document_undeliverables.csv and output_document_undeliverables.json (see sections 16–17).

Record Fields (each object in records array)#

Email Identification Fields#

FieldTypeDescription
sender_emailstringThe email address of the auto-response sender. This is the raw SenderEmail from the email record in the database.
sender_namestring or nullThe display name of the sender, if available from the email headers. May be null if the email only contained an address without a display name.
subjectstringThe full subject line of the auto-response email.
received_datestringThe date and time the email was received, as recorded in the source database. Format may vary based on the source system.
inbox_sourcestringThe inbox/account from which the email was fetched. Corresponds to the AccountName in the email database (e.g., energy@thompson.com, grants@thompson.com, resources@associationexecs.com). This determines which business line the email belongs to.
message_idstringThe unique identifier for the email record, typically the database primary key Id from the Hodor dmorders_thompson SQL table. Used as the primary key for tracking this email throughout the pipeline.
original_sender_emailstring or nullThe original sender email before any normalization or cleanup. Present when the pipeline modifies the sender email during processing (e.g., stripping display names, handling forwarded emails). Null if no modification was needed.
bodystringThe full body text of the auto-response email. Contains the raw text content that was analyzed by the LLM classifier to determine the person's status, extract replacement contacts, new email addresses, etc.

Contact Lookup Fields#

FieldTypeDescription
lookup_emailstring or nullThe email address that was actually used for contact lookup across backend systems. This may differ from sender_email when the auto-response body references a different email address (the source_email). If null, no lookup was performed.
contact_foundbooleantrue if the contact was found in at least one backend system (Cupola, Hodor, Multipub, or Salesforce). false if the email address was not found in any system. This is the primary indicator of whether downstream actions can be taken.
contact_systemsarray of stringsList of backend systems where the contact was found. Possible values in the array: cupola (contact management system), hodor (Thompson's dmorders database), multipub (subscription/publication management), salesforce (CRM). An empty array means the contact was not found anywhere.
mock_contact_systemsarray of stringsSubset of contact_systems that were operating in mock/simulated mode during this run. In dry-run mode, all systems are mocked. In read-only mode, write operations are mocked but reads are live. In live mode, this array is empty. Useful for distinguishing real vs. simulated lookup results.
cupola_org_idstring or nullThe CUPOLA organization_id for the preferred org-person link. This is the organization identifier in the Cupola contact management system. Null if the contact was not found in Cupola or Cupola was not queried.
cupola_org_person_idstring or nullThe CUPOLA org_person_id — the unique identifier for the link between a person and an organization in Cupola. This is the record that gets marked active/inactive when processing status changes. Null if not found in Cupola.
cupola_person_idstring or nullThe CUPOLA person_id — the unique identifier for the person entity in Cupola, independent of their organizational affiliation. A person may have multiple org_person links but only one person_id. Null if not found in Cupola.
hodor_pros_numstring or nullThe HODOR ProsNum (prospect number) — the unique contact identifier in Thompson's Hodor/dmorders database system. This is used to update contact status in Hodor (e.g., marking as "No Longer with Firm"). Null if not found in Hodor.
org_namestring or nullThe organization/company name associated with the contact. May be sourced from Cupola, Hodor, or other backend systems (see org_name_source). Null if no organization name was found.
person_namestring or nullThe full name of the contact person. May be sourced from Cupola, Hodor, or other backend systems (see person_name_source). Null if no person name was found.
lookup_sources_availablestringComma-separated list of all backend connector names that were available and queried during the contact lookup phase, regardless of whether they returned results. Represents the scope of the search. Example: cupola, hodor, multipub, salesforce.
person_name_sourcestring or nullThe specific backend system that provided the person_name value (e.g., cupola, hodor). Null if no person name was found. Useful for provenance tracking when multiple systems have conflicting data.
org_name_sourcestring or nullThe specific backend system that provided the org_name value. Null if no org name was found.
sources_used_fieldsstringA semicolon-separated summary of which backend system provided each specific data field. Format: field_name:source_system; field_name:source_system. Example: person_name:cupola; org_name:hodor; cupola_org_id:cupola; hodor_pros_num:hodor. This provides full provenance for every piece of contact data.

Determination Fields#

FieldTypeDescription
determinationstringThe final determination type assigned to this email after LLM classification, QA review, and contact lookup. Possible values: inactive (person has left the company, retired, or is deceased — mark contact inactive across all systems), active (person is confirmed active — ensure records are current), replacement (a replacement contact was identified — mark original inactive and add the replacement), title_update (person's job title has changed — update title across systems), email_update (person's email address has changed — update email across systems), unknown (email is not relevant, is spam, or cannot be classified — no action taken). Empty string if determination has not been made.
confidencefloatA confidence score between 0.0 and 1.0 representing how confident the system is in the determination. Higher values indicate greater certainty. A confidence of 0.0 typically indicates no determination was made. Displayed as a percentage in human-readable reports (e.g., 0.95 → 95%).
llm_categorystring or nullThe final LLM classification category after QA review. This is the category used to drive the determination logic. Possible values: undeliverable (bounce-back or invalid email address), left company (person departed the organization), retired (person retired), deceased (person is deceased), out of office (temporary absence — auto-reply), changed email (person's email address has changed). Null if classification was not performed.
initial_llm_categorystring or nullThe category assigned by the first-pass classification LLM agent, before QA review. When QA does not change the category, this matches llm_category. When QA corrects the classification, this preserves the original (incorrect) category for audit purposes. Null if classification was not performed.
person_statusstring or nullThe employment/organizational status of the person as extracted by the LLM from the email body. Examples: left_company, retired, deceased, active, on_leave. This is a more granular status than the llm_category and is used as input to the determination logic. Null if not extracted.
email_statusstring or nullThe status of the email address itself as determined by the LLM. Examples: valid, invalid, bounced, changed. Used to distinguish between "person is gone" vs. "email address is bad." Null if not extracted.
qa_correction_appliedbooleantrue if the QA agent reviewed the initial classification and changed the category. false if the QA agent confirmed the original classification or if QA review was not performed. When true, initial_llm_category and llm_category will differ.
qa_explanationstring or nullThe QA agent's textual explanation for why it changed or confirmed the initial classification. Provides transparency into the QA review decision. Null if QA review was not performed.
replacement_infoarray of objectsList of replacement contacts identified from the auto-response email. Each object contains: replacement_name (string or null — the name of the replacement person), replacement_email (string or null — the email address of the replacement person), replacement_title (string or null — the job title of the replacement person). An empty array means no replacement was identified. Multiple entries indicate multiple replacements were mentioned.
sender_new_emailstring or nullA new email address for the sender, extracted from the auto-response body. Relevant for email_update determinations where the person's email has changed. Also used in replacement scenarios where the departing person provides their new personal/forwarding email. Null if no new email was identified.
retired_personal_emailstring or nullA personal/private email address provided by someone who has retired or left their organization. Distinct from sender_new_email in that this is typically a non-work email (e.g., Gmail, Yahoo) shared for personal contact purposes rather than as an official forwarding address. Null if none was provided.
is_long_term_leavebooleantrue if the LLM determined the person is on an extended/long-term leave of absence (e.g., maternity leave, sabbatical, medical leave) rather than having permanently departed the organization. This affects the determination — long-term leave contacts are flagged for review rather than immediately marked inactive. false for all other cases.
source_emailstring or nullThe email address extracted from the auto-response body that was used as the basis for contact lookup. This may differ from sender_email — for example, when a mail server's bounce message references the intended recipient's address, which is the address we actually need to look up. Null if no alternate source email was extracted.
notesstring or nullFree-form reasoning or notes from the LLM explaining the basis for its classification and any additional context it identified in the email body. Null if no reasoning was provided.
standard_actions_descriptionstring or nullA human-readable, multi-line description of the standard actions that WOULD be performed for this determination type in a live run, based on the contact systems found and the determination type. This is generated from the determination reference documentation and serves as an expected-behavior checklist regardless of the current run mode. Null if no determination was made.

Multipub Validation Fields#

FieldTypeDescription
multipub_validation_performedbooleantrue if Multipub subscription validation was executed for this email. Validation is performed for INACTIVE and REPLACEMENT determinations to check whether the person has active subscriptions before marking them inactive. false if validation was skipped (e.g., for ACTIVE or UNKNOWN determinations).
multipub_subsnumstring or nullThe Multipub subscriber number (SubsNum) for this contact, if found. This is the unique identifier for a subscriber in the Multipub subscription management system. Null if the contact was not found in Multipub.
multipub_match_methodstring or nullThe method by which the contact was matched to a Multipub subscriber record. Possible values include matching by email address, by name, or by other criteria. Null if no match was found.
multipub_has_active_subscriptionbooleantrue if the contact has at least one currently active subscription in Multipub. When true for an INACTIVE determination, the inactive marking is HALTED (deferred) because the person still has live subscription activity that needs to be addressed by the sales team.
multipub_has_recently_expiredbooleantrue if the contact has subscriptions that expired within the last 12 months. These are flagged for the sales team's awareness but do not halt inactive processing.
multipub_has_recent_single_issuebooleantrue if the contact has recent single-issue (one-time) purchases. These are flagged for the sales team's awareness but do not halt inactive processing.
multipub_active_order_countintegerThe number of currently active subscription orders. Zero if no active subscriptions exist.
multipub_expired_order_countintegerThe number of subscriptions that expired within the last 12 months.
multipub_single_issue_order_countintegerThe number of recent single-issue purchase orders.
multipub_flagged_for_reviewbooleantrue if this record was flagged for manual review due to subscription-related concerns (active subscriptions, recently expired, or single-issue orders found for an inactive person).
multipub_review_reasonstring or nullThe specific reason the record was flagged for review. Examples: Active subscriptions found for inactive contact, Recently expired subscriptions require sales follow-up. Null if not flagged.
multipub_deferredbooleantrue if the inactive marking was HALTED because active subscriptions were found in Multipub. This is the most critical flag — it means the system deliberately stopped processing this email to prevent marking someone inactive who still has live subscriptions. These records must be manually reviewed and resolved.

Raw LLM Output Fields#

FieldTypeDescription
raw_classification_resultobject or nullThe complete, unmodified JSON output from the first-pass LLM classification agent. Contains the raw category, confidence, sender_new_email, alternate_contact, retired_personal_email, is_long_term_leave, reasoning, and any other fields the LLM produced. Null if classification was not performed. Preserved for audit and debugging purposes.
raw_qa_resultobject or nullThe complete, unmodified JSON output from the QA review LLM agent. Contains final_category, final_sender_new_email, final_alternate_contact, final_retired_personal_email, is_long_term_leave, qa_correction_applied, qa_explanation, and any other fields. Null if QA review was not performed. Preserved for audit and debugging.

Actions Fields#

FieldTypeDescription
actionsarray of objectsList of all actions executed (or mocked) for this email. Each action object contains: system (string — the backend system, e.g., cupola, hodor, salesforce, multipub), operation (string — the operation performed, e.g., mark_inactive, add_contact, update_email, check_subscriptions), success (boolean — whether the action completed successfully), detail (string — additional detail text about what was done, may be empty). An empty array means no actions were executed.

Outcome Fields#

FieldTypeDescription
statusstringThe final processing outcome status. Possible values: success (all actions completed), failed (one or more actions failed), skipped_no_contact (contact not found — no actions taken), skipped_unknown (determination was unknown — no actions needed), error (unexpected error occurred), deferred_multipub (halted due to active Multipub subscriptions), pending (should not appear in final output).
skip_reasonstring or nullA human-readable explanation for why processing was skipped or deferred. Null when the email was fully processed. Examples: Contact not found in any system, Determination is unknown — no actions required, Deferred: active Multipub subscriptions.
error_messagestring or nullThe error message text when status is error or failed. Contains the exception message or a description of what went wrong. Null when no error occurred.
duration_msfloatThe wall-clock processing time for this individual email in milliseconds. Measures the time from when this email started processing to when it completed. Useful for identifying slow emails that may be caused by slow LLM responses, slow database lookups, or complex action execution.

5. processing_report_master.csv and processing_report_ip4.csv#

Each run emits two CSV companions from ReportGenerator.write_report: processing_report_master.csv (full column ledger) and processing_report_ip4.csv (IP4-facing subset, fixed column order — 2026-05-03).

processing_report_master.csv#

Purpose#

Spreadsheet-compatible export with one row per processed email and the complete flattened column set (ReportGenerator._CSV_COLUMNS). Use this file for internal analysis, Client Services run reports, and audit.

Generation Conditions#

Generated whenever processing_report.log is generated.

Format#

CSV with UTF-8 BOM encoding (utf-8-sig for Excel compatibility). All fields are quoted (QUOTE_ALL). Newlines within field values are replaced with spaces to prevent row splitting.

Column Reference#

#Column HeaderSource FieldDescription
1Email IDmessage_idUnique identifier for the email record (database primary key from Hodor).
2Sender Emailsender_emailThe email address of the auto-response sender.
3Original Sender Emailoriginal_sender_emailThe sender email before normalization, if it was modified. Empty if unchanged.
4Lookup Emaillookup_emailThe email address actually used for contact lookup across backend systems. May differ from sender email.
5Sender Namesender_nameDisplay name of the sender from email headers. Empty if not available.
6SubjectsubjectFull subject line of the auto-response email. Newlines replaced with spaces.
7Received Datereceived_dateDate and time the email was received.
8Inbox Sourceinbox_sourceThe inbox/account (AccountName) the email was fetched from. Determines business line.
9BodybodyFull body text of the email. Newlines replaced with spaces.
10Contact Foundcontact_foundYes if contact was found in at least one backend system, No otherwise.
11Sourceslookup_sources_availableComma-separated list of all backend connectors queried during contact lookup.
12Sources Usedsources_used_fieldsSemicolon-separated provenance map showing which system provided each data field (e.g., person_name:cupola; org_name:hodor).
13Contact Systems (Live)computedComma-separated list of systems where the contact was found using LIVE (non-mocked) connections. Empty if all lookups were mocked or contact not found.
14Contact Systems (Mock)computedComma-separated list of systems where the contact was found using MOCKED connections. Empty in live mode.
15HODOR ProsNumhodor_pros_numThe Hodor prospect number for this contact. Empty if not found in Hodor.
16CUPOLA Org IDcupola_org_idThe Cupola organization ID. Empty if not found in Cupola.
17CUPOLA Org Person IDcupola_org_person_idThe Cupola org-person link ID. Empty if not found.
18CUPOLA Person IDcupola_person_idThe Cupola person entity ID. Empty if not found.
19Multipub Subsnummultipub_subsnumThe Multipub subscriber number. Empty if not found in Multipub.
20Initial LLM Categoryinitial_llm_categoryCategory from the first-pass LLM classification, before QA review. Empty if not classified.
21Final LLM Categoryllm_categoryFinal category after QA review. Empty if not classified.
22QA Correction Appliedqa_correction_appliedYes if QA agent changed the initial classification, No otherwise.
23QA Explanationqa_explanationQA agent's reasoning for its decision. Empty if QA was not performed.
24Person Statusperson_statusPerson's employment/org status from LLM (e.g., left_company, retired). Empty if not extracted.
25Email Statusemail_statusStatus of the email address from LLM (e.g., valid, bounced). Empty if not extracted.
26DeterminationdeterminationFinal determination type: inactive, active, replacement, title_update, email_update, unknown. Empty if not determined.
27ConfidenceconfidenceConfidence score formatted as percentage (e.g., 95%). 0% if not determined.
28Source Emailsource_emailEmail address extracted from auto-response body used for lookup. Empty if same as sender.
29New Emailsender_new_emailNew email address identified for the person. Empty if none found.
30Replacement NamecomputedSemicolon-separated list of replacement contact names (from replacement_info). Empty if no replacements.
31Replacement EmailcomputedSemicolon-separated list of replacement contact email addresses. Empty if no replacements.
32Replacement TitlecomputedSemicolon-separated list of replacement contact job titles. Empty if no replacements.
33Retired Personal Emailretired_personal_emailPersonal email provided by departed/retired person. Empty if none provided.
34Long-term Leaveis_long_term_leaveYes if person is on long-term leave, No otherwise.
35ReasoningnotesLLM reasoning/notes for the determination. Empty if none provided.
36Multipub Validatedmultipub_validation_performedYes if Multipub validation was performed, No otherwise.
37Multipub Subscribermultipub_subsnumMultipub subscriber number (same as column 19). Empty if not found.
38Multipub Match Methodmultipub_match_methodHow the contact was matched in Multipub (e.g., by email, by name). Empty if not matched.
39Multipub Active Subsmultipub_has_active_subscriptionYes if active subscriptions exist, No otherwise.
40Multipub Active Order Countmultipub_active_order_countNumber of active subscription orders. 0 if none.
41Multipub Recently Expiredmultipub_has_recently_expiredYes if subscriptions expired within 12 months, No otherwise.
42Multipub Expired Order Countmultipub_expired_order_countNumber of recently expired orders. 0 if none.
43Multipub Single-Issuemultipub_has_recent_single_issueYes if recent single-issue purchases exist, No otherwise.
44Multipub Single-Issue Order Countmultipub_single_issue_order_countNumber of single-issue orders. 0 if none.
45Multipub Flagged for Reviewmultipub_flagged_for_reviewYes if record was flagged for manual review, No otherwise.
46Multipub Review Reasonmultipub_review_reasonReason the record was flagged. Empty if not flagged.
47Multipub Deferredmultipub_deferredYes if inactive marking was halted due to active subscriptions, No otherwise.
48CUPOLA Actions SummarycomputedSemicolon-separated summary of all Cupola-specific actions. Format: [OK/FAIL] system: operation - detail. Empty if no Cupola actions.
49Actions SummarycomputedSemicolon-separated summary of all non-Cupola actions (Hodor, Salesforce, etc.). Format: [OK/FAIL] system: operation - detail. Empty if no non-Cupola actions.
50StatusstatusFinal processing status: success, failed, skipped_no_contact, skipped_unknown, error, deferred_multipub, pending.
51Skip Reasonskip_reasonReason for skipping. Empty if not skipped.
52Error Messageerror_messageError text if status is error/failed. Empty if no error.
53Duration (ms)duration_msProcessing duration in milliseconds, formatted as an integer.

processing_report_ip4.csv#

Purpose#

Filtered export for Sai Teja / IP4: only rows that need manual Cupola follow-up under the agreed LLM categories, with a fixed 23-column layout so templates and macros do not drift (ReportGenerator._CSV_COLUMNS_IP4).

Generation Conditions#

Written together with the master CSV whenever processing_report.log is generated.

Row filter#

Only emails whose Final LLM Category (or, if empty, Initial LLM Category) normalizes to one of: Out of Office, Retired, Deceased, Left Company, Changed Email (ReportGenerator._IP4_ACTIONABLE_CATEGORIES). All other categories are excluded from this file (they still appear on the master CSV and in processing_report.json).

Format#

Same as the master CSV: UTF-8 BOM, QUOTE_ALL, newline sanitation.

Column Reference (fixed order — 23 columns)#

#Column HeaderSource Field / derivationDescription
1Email IDmessage_idSame as master §5 column 1.
2Inbox Sourceinbox_sourceSame as master §5 column 8.
3Original Sender Emailoriginal_sender_emailSame as master §5 column 3.
4Sender Emailsender_emailSame as master §5 column 2.
5Lookup Emaillookup_emailSame as master §5 column 4.
6Source Emailsource_emailSame as master §5 column 28.
7Sender Namesender_nameSame as master §5 column 5.
8SubjectsubjectSame as master §5 column 6.
9BodybodySame as master §5 column 9.
10Initial LLM Categoryinitial_llm_categorySame as master §5 column 20.
11Final LLM Categoryllm_categorySame as master §5 column 21.
12DeterminationdeterminationSame as master §5 column 26.
13Person Statusperson_statusSame as master §5 column 24.
14Email Statusemail_statusSame as master §5 column 25.
15CUPOLA Org IDcupola_org_idSame as master §5 column 16.
16CUPOLA Org Person IDcupola_org_person_idSame as master §5 column 17.
17CUPOLA Person IDcupola_person_idSame as master §5 column 18.
18New Emailsender_new_emailSame as master §5 column 29.
19Replacement Namecomputed from replacement_infoSame as master §5 column 30.
20Replacement Emailcomputed from replacement_infoSame as master §5 column 31.
21Replacement Titlecomputed from replacement_infoSame as master §5 column 32.
22Retired Personal Emailretired_personal_emailSame as master §5 column 33.
23CUPOLA Actions Summarycomputed from actions (Cupola only)Same as master §5 column 48.

6. category_summary_report.csv#

Purpose#

A consolidated summary that groups all processed emails into five main business categories. This report collapses the granular LLM categories into broader groups for high-level analysis and reporting to stakeholders who need to understand the distribution of auto-response types without granular detail.

Generation Conditions#

Generated when at least one email is processed. Not generated if the records list is empty.

Format#

CSV with UTF-8 BOM encoding. All fields are quoted. Rows are ordered by category in a fixed sequence: Undeliverable, Left Company / Retired / Deceased, Out of Office, Changed Email, Other.

Category Mapping#

Main CategoryMapped From LLM Categories
Undeliverableundeliverable
Left Company / Retired / Deceasedleft company, retired, deceased
Out of Officeout of office
Changed Emailchanged email
OtherAny category not matching the above, or null/empty categories

Column Reference#

#Column HeaderDescription
1CategoryThe main business category this email was mapped to (one of the five categories above).
2Email IDUnique identifier for the email record (same as message_id).
3Sender EmailThe sender's email address.
4Lookup EmailThe email address used for contact lookup. Empty if same as sender or not available.
5Contact FoundYes if contact was found in any backend system, No otherwise.
6Contact SystemsComma-separated list of systems where the contact was found.
7DeterminationThe final determination type (inactive, active, replacement, etc.). Empty if not determined.
8StatusProcessing outcome status (success, failed, skipped_no_contact, etc.).
9Org NameOrganization name associated with the contact. Empty if not found.
10Person NamePerson's name. Falls back to sender name if person name is not available.
11CUPOLA Org IDCupola organization ID. Empty if not in Cupola.
12CUPOLA Org Person IDCupola org-person link ID. Empty if not in Cupola.
13CUPOLA Person IDCupola person ID. Empty if not in Cupola.
14HODOR ProsNumHodor prospect number. Empty if not in Hodor.
15Multipub SubsNumMultipub subscriber number. Empty if not in Multipub.

7. category_summary_report.json#

Purpose#

JSON companion to the category summary CSV. Provides the same grouped data in a machine-readable format with records organized under their respective category keys.

Generation Conditions#

Generated alongside category_summary_report.csv.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

FieldTypeDescription
generated_atstring (ISO 8601)Timestamp when this file was generated.
run_startstring (ISO 8601)Timestamp when the pipeline run began.
total_emailsintegerTotal number of emails in this report.
categoriesobjectAn object where each key is a main category name and the value is an object containing record_count (integer) and records (array of objects). Each record object has the same fields as the CSV columns listed above, using snake_case keys: category, email_id, sender_email, lookup_email, contact_found, contact_systems, determination, status, org_name, person_name, cupola_org_id, cupola_org_person_id, cupola_person_id, hodor_pros_num, multipub_subsnum.

8. classifier_output/classifier_output.json#

Purpose#

The raw, unprocessed output from the LLM classification and QA agents for every email that went through classification. This file preserves the full agent responses before any post-processing, mapping, or interpretation by the pipeline. It serves as the primary audit trail for LLM decision-making and is essential for debugging classification issues, evaluating LLM accuracy, and tuning prompts.

Generation Conditions#

Generated only when at least one email went through LLM classification (i.e., at least one record has raw_classification_result or raw_qa_result populated). Created in a classifier_output/ subdirectory within the run folder.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

FieldTypeDescription
generated_atstring (ISO 8601)Timestamp when this file was generated.
run_startstring (ISO 8601)Timestamp when the pipeline run began.
total_classified_emailsintegerNumber of emails that were classified by the LLM in this run.
recordsarray of objectsOne object per classified email (see below).

Record Fields#

FieldTypeDescription
email_idstringUnique identifier for the email.
sender_emailstringSender's email address.
sender_namestring or nullSender's display name.
subjectstringEmail subject line.
inbox_sourcestringInbox/account the email came from.
classification_agent_outputobject or nullThe complete raw JSON response from the first-pass classification LLM agent. Structure depends on the LLM prompt and may include: category, confidence, sender_new_email, alternate_contact, retired_personal_email, is_long_term_leave, reasoning, person_status, email_status, and any additional fields the LLM returns. Null if classification was not performed.
qa_agent_outputobject or nullThe complete raw JSON response from the QA review LLM agent. Structure depends on the QA prompt and may include: final_category, final_sender_new_email, final_alternate_contact, final_retired_personal_email, is_long_term_leave, qa_correction_applied, qa_explanation, and any additional fields. Null if QA review was not performed.

9. classifier_output/classifier_output.csv#

Purpose#

A tabular/spreadsheet-friendly view of the LLM classification and QA outputs. Flattens the raw agent responses into discrete columns for side-by-side comparison of initial classification vs. QA review results, and includes the Determination the pipeline derived from the QA-final category (see column 6 below).

Note on categories: The classification and QA agents are expected to assign a single label from the nine LLM categories. If the model returns a compound string (for example comma-separated labels), the pipeline normalizes it to one canonical category using a fixed severity priority before mapping to actions, and the CSV reflects that normalized value in Initial Category and Final Category.

Generation Conditions#

Generated alongside classifier_output.json.

Format#

CSV with UTF-8 BOM encoding. All fields quoted.

Column Reference#

#Column HeaderDescription
1Email IDUnique identifier for the email.
2Sender EmailSender's email address.
3Sender NameSender's display name. Empty if not available.
4SubjectEmail subject line (newlines replaced with spaces).
5Inbox SourceInbox/account the email was fetched from.
6DeterminationThe pipeline's mapped action type for this email: one of unknown, inactive, active, replacement, email_update, title_update (same meaning as elsewhere in processing reports). Derived from the QA-final LLM category and business rules in category_mapper.map_category_to_determination, not a separate LLM field. Empty if not populated on the processing record.
7Initial CategoryCategory assigned by the first-pass classification agent (from raw_classification_result.category), after normalization to a single canonical label when needed.
8ConfidenceConfidence level from the classification agent (from raw_classification_result.confidence).
9Sender New Email (Classification)New email address extracted by the classification agent. Empty if none found.
10Alternate Contact (Classification)Alternate/replacement contact info extracted by classification agent. May be a structured string. Empty if none found.
11Retired Personal Email (Classification)Personal email extracted by classification agent. Empty if none found.
12Is Long Term Leave (Classification)Yes if classification agent identified long-term leave, No otherwise.
13ReasoningThe classification agent's reasoning text explaining its categorization.
14Final CategoryCategory after QA review (from raw_qa_result.final_category), after normalization to a single canonical label when needed.
15Final Sender New EmailNew email after QA review correction. Empty if not changed or not applicable.
16Final Alternate ContactAlternate contact after QA correction. Empty if not changed.
17Final Retired Personal EmailPersonal email after QA correction. Empty if not changed.
18Is Long Term Leave (QA)Yes if QA agent confirmed long-term leave, No otherwise.
19QA Correction AppliedYes if QA changed the classification, No if it confirmed the original.
20QA ExplanationQA agent's explanation of its review decision.

10. output_document_inactive_people.csv#

Purpose#

A business deliverable listing all people determined to be INACTIVE (left company, retired, deceased) along with the specific actions taken or planned across each backend system. This document is used by operations teams to verify that inactive contacts have been properly removed or suppressed across CUPOLA, HODOR, SFMC, and Multipub. It also provides the sales team with active subscription information so they can follow up on transferring subscriptions. Not emailed on N04. The marketing team receives the slimmer SFMC import file(s) described in Marketing suppression deliverable (*_NoLongerThere_*.csv) via notify_marketing_suppression.

Generation Conditions#

Generated only when at least one inactive person record exists in the output document collector.

Format#

CSV with UTF-8 BOM encoding. All fields quoted.

Column Reference#

#Column HeaderDescription
1IdRecord identifier. Typically the email's database primary key or a generated unique ID for tracking this inactive person record through downstream workflows.
2AccountNameThe inbox/account source (business line email) the auto-response was received at. Examples: energy@thompson.com, grants@thompson.com, resources@associationexecs.com. Determines which business line is affected and which SFMC suppression list to use.
3Org NameThe organization/company name the person was associated with. Sourced from Cupola or Hodor contact records. Empty if not available.
4Person NameThe full name of the inactive person. Sourced from Cupola or Hodor contact records. Empty if not available.
5EmailThe email address of the inactive person (the "Auto Response Received From" address). This is the email that triggered the auto-response and is the address being marked inactive across systems.
6Status with OrgThe person's status relative to their organization as determined by the LLM (e.g., left_company, retired, deceased). Provides context for why the person is being marked inactive. Empty if not determined.
7CUPOLA Org IDThe Cupola organization ID for the preferred org-person link. Used by operations to verify the correct organization record in Cupola. Empty if not in Cupola.
8CUPOLA Person IDThe Cupola person entity ID. Used by operations to locate the person record in Cupola. Empty if not in Cupola.
9CUPOLA Org Person IDsComma-separated list of all Cupola org_person_id values that were marked inactive for this email address. A person may have multiple org-person links (e.g., they are a contact at multiple organizations). All linked records are marked inactive. Empty if not in Cupola.
10HODOR ProsNumsComma-separated list of all Hodor prospect numbers (ProsNum) that were marked as "No Longer with Firm" for this email address. A person may have multiple prospect records in Hodor. Empty if not in Hodor.
11Multipub SubsnumThe Multipub subscriber number, if the contact was found in the Multipub subscription system. Empty if not found.
12Salesforce IDsComma-separated list of Salesforce Lead or Contact record IDs associated with this email, if the contact was found in Salesforce. Empty if not in Salesforce.
13HODOR StatusThe HODOR status action that was taken. Typically No Longer with Firm for inactive contacts. Empty if no Hodor action was taken.
14SFMC Suppression AddedYes if the email address was added to the SFMC (Salesforce Marketing Cloud) Auto Suppression List for the corresponding business line. No if the suppression was not added (e.g., if SFMC operations were mocked or failed).
15Multipub Active SubscriptionsA summary of active subscriptions found in Multipub for this person. Contains serialized order details (up to 3 entries) for the sales team to follow up on. These are subscriptions that need to be transferred or cancelled since the person is no longer active. Empty if no active subscriptions.
16Multipub Recent OrdersA summary of recently expired or single-issue orders from Multipub (within the past 12 months). Contains serialized order details (up to 3 entries) for sales team awareness. Empty if no recent orders.

11. output_document_inactive_people.json#

Purpose#

JSON companion to the inactive people CSV. Contains the same data in structured format for programmatic consumption.

Generation Conditions#

Generated alongside the CSV when inactive person records exist.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

FieldTypeDescription
list_namestringAlways "List of Inactive People".
purposestringAlways "Remove / follow up these emails from across our systems (CUPOLA, HODOR, SFMC, MultiPub)".
generated_atstring (ISO 8601)Timestamp when this file was generated.
record_countintegerNumber of inactive person records in this file.
recordsarray of objectsEach object is a full serialization of the InactivePersonRecord Pydantic model. All fields from the CSV are present using snake_case naming. Nested lists and objects (such as multipub_active_subscriptions and multipub_recent_orders) are fully expanded as arrays of objects rather than serialized strings.

Marketing suppression deliverable ({BusinessUnit}_NoLongerThere_{YYYY-MM-DD}.csv)#

Purpose#

SFMC-ready suppression import file(s) for the marketing team (notification catalog N04) — this is the production suppression path. Derived from the same InactivePersonRecord rows as output_document_inactive_people.csv, but only the email list and three import columns — no Cupola/Hodor/Multipub detail. Live SFMC REST upsert during processing is not in production; see MARKETING_SUPPRESSION.html.

Generation Conditions#

Written in the same pass as output_document_inactive_people.csv when at least one inactive person record exists. Implemented in marketing_suppression_deliverable.write_marketing_suppression_deliverables. One file per business-unit label (sorted by label); emails are deduped case-insensitively within each file.

Format#

CSV with UTF-8 BOM encoding. All fields quoted. Filename pattern: {BusinessUnit}_NoLongerThere_{YYYY-MM-DD}.csv where YYYY-MM-DD is parsed from run_{date}_{time} on the run folder, or today if the pattern does not match. Today resolve_business_unit_label maps every inbox to Marketing.

Column Reference#

#Column HeaderDescription
1Email AddressInactive person email to suppress in SFMC.
2StatusAlways Unsubscribed (fixed import value).
3Date AddedISO date (YYYY-MM-DD) matching the run-date token in the filename.

Notification#

Notifier.notify_marketing_suppression discovers files with glob *_NoLongerThere_*.csv and attaches them only. output_document_inactive_people.csv is not attached. See NOTIFICATIONS_CATALOG — N04 and the detailed guide MARKETING_SUPPRESSION.html.


12. output_document_alternate_contacts.csv#

Purpose#

A business deliverable consolidating all replacement/alternate contacts identified from auto-response emails. When an inactive person's auto-response mentions a replacement (e.g., "Please contact Jane Doe at jane@company.com instead"), the replacement's information is captured here with planned actions for adding or updating them across CUPOLA, HODOR, and Multipub.

Generation Conditions#

Generated only when at least one alternate contact record exists in the output document collector.

Format#

CSV with UTF-8 encoding, minimal quoting.

Column Reference#

#Column HeaderDescription
1IdRecord identifier for tracking this alternate contact through downstream workflows.
2AccountNameThe inbox/account source (business line email) the original auto-response was received at. Determines which HODOR library the alternate contact will be imported into.
3Email Received FromThe email address of the original (now inactive) person whose auto-response mentioned this alternate contact. This links the alternate contact back to the inactive person they are replacing.
4SubjectSource auto-response subject (traceability).
5Email BodyFull raw body of the source auto-response email (plain text or HTML as stored).
6Message IDSource message identifier for traceability.
7Org IDThe Cupola organization ID (organization_id) for the organization the alternate contact is being added to. This is typically the same organization as the original inactive person. Empty if not in Cupola.
8Org NameThe organization/company name for the original sender, resolved once via Contact.resolve_organization_for_deliverable (CUPOLA preferred row, then Hodor firm, then first org hint). Always matches Firm inside HODOR Import Data. Comments may include Org source: (cupola_preferred, hodor_firm, hint, replacement_cupola). Empty if not available.
9Alternate Person NameThe full name of the replacement/alternate contact person, as extracted by the LLM from the auto-response body. For HODOR import, this is split into Fname (first name) and Lname (last name). Empty if not provided.
10Alternate Person TitleThe job title of the alternate contact (e.g., "Director of Marketing", "VP Sales"). Maps to the HODOR Titl field. Empty if not provided.
11Alternate Person EmailThe email address of the alternate contact. Maps to the HODOR Email field. This is the primary identifier used to check if the person already exists in CUPOLA. Empty if not provided.
12Alternate Person PhoneThe phone number of the alternate contact. Maps to the HODOR Phone field. Empty if not provided.
13Alternate Person ExtThe phone extension of the alternate contact. Maps to the HODOR pext field. Empty if not provided.
14Org Person IDThe Cupola org_person_id for the alternate contact, if they already exist in Cupola. Used when the action is update rather than add. Empty if the contact is new to Cupola.
15Person IDThe Cupola person_id for the alternate contact, if they already exist in Cupola. Empty if the contact is new.
16HODOR ProsNumThe Hodor prospect number for the alternate contact, if they already exist in Hodor. Empty if new to Hodor.
17CommentsFree-form comments or context about this alternate contact, typically derived from the auto-response text. May include the original person's name, the nature of the handoff, or other contextual information. Empty if none.
18CUPOLA ActionThe planned action for this alternate contact in Cupola. Values: add (pipeline will call add_contact because check_contact_exists found no row for this email), update (at least one Cupola row exists for this email — typically same mailbox/org-person handling). Empty if no Cupola action is planned. Note: The underlying add_contact implementation still enforces email/org rules: if a row already exists for the target org it returns that org_person_id; if the email exists only under other orgs it reuses person_id and inserts a new org-person link. See docs/connections/cupola.html.
19HODOR LibraryThe HODOR library code that this alternate contact will be imported into. Determined by the AccountName (inbox source). Mapping: energy@thompson.comENGY, grants@thompson.comGRDM, resources@associationexecs.comASSN, resources@associationtrends.comASSN, resources@thealmanacofamericanpolitics.comGR. Empty if library cannot be determined.
20HODOR Import DataJSON-serialized object containing the fields needed for the HODOR import template: Fname (first name), Lname (last name), Titl (title), Firm (organization name), Email (email address), Phone (phone number), pext (phone extension). Empty if no HODOR import is planned.
21Multipub Sales RequestYes if this alternate contact was provided to the sales team for Multipub follow-up (typically when the original inactive person had active subscriptions that need to be transferred). No otherwise.

13. output_document_alternate_contacts.json#

Purpose#

JSON companion to the alternate contacts CSV. Contains the same data in structured format.

Generation Conditions#

Generated alongside the CSV when alternate contact records exist.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

FieldTypeDescription
list_namestringAlways "List of Alternate Contacts".
purposestringAlways "Consolidate list of all provided Alternate Contacts and add / update across all systems".
generated_atstring (ISO 8601)Timestamp when this file was generated.
record_countintegerNumber of alternate contact records.
recordsarray of objectsFull serialization of AlternateContactRecord Pydantic models. All fields match the CSV columns using snake_case naming. The hodor_import_data field is a proper JSON object (not a serialized string).

14. output_document_inactive_new_org.csv#

Purpose#

A business deliverable tracking inactive people who have moved to a new organization. When an auto-response indicates someone has left for a different company (e.g., "I have moved to XYZ Corp"), this document captures the new organization details and records the planned actions for potentially adding or updating them in CUPOLA and HODOR at their new organization.

Generation Conditions#

Generated only when at least one inactive-at-new-org record exists in the output document collector.

Format#

CSV with UTF-8 encoding, minimal quoting.

Column Reference#

#Column HeaderDescription
1IdRecord identifier for tracking this record through downstream workflows.
2Account NameThe inbox/account source (business line email) the auto-response was received at.
3Email Received FromThe email address of the person who sent the auto-response (the person who moved to a new organization).
4Person NameThe name of the person who has moved to a new organization. Empty if not available.
5New Org IDThe identifier for the new organization (e.g., a Cupola organization ID if the new org already exists in Cupola, or a newly assigned ID). Empty if the new organization has not been identified in any system.
6New Org NameThe name of the new organization the person has moved to, as extracted from the auto-response body by the LLM. Empty if not provided.
7New Org TitleThe person's job title at their new organization. Empty if not provided.
8New Org EmailThe person's email address at their new organization (e.g., person@newcompany.com). Empty if not provided.
9New Org PhoneThe person's phone number at their new organization. Empty if not provided.
10Org Person IDThe Cupola org_person_id for this person, if they already exist in Cupola. Used for updating existing records. Empty if not in Cupola.
11Person IDThe Cupola person_id for this person, if they already exist in Cupola. Empty if not in Cupola.
12HODOR ProsNumThe Hodor prospect number for this person, if they exist in Hodor. Empty if not in Hodor.
13CommentsFree-form comments or context about the person's move, derived from the auto-response text. May include original organization name, reason for move, or other details. Empty if none.
14CUPOLA ActionThe planned Cupola action for this record. Values: add (person/org will be added to Cupola), update (existing record will be updated with new org info), skip (record will not be modified in Cupola), ignore (organization is not AI-appropriate and will not be added). Empty if no Cupola action planned.
15CUPOLA Org ExistsYes if the new organization already exists in Cupola. No if the organization is not yet in Cupola. This determines whether the person can be directly added to the existing org or if the org needs to be created first.
16CUPOLA AI AppropriateYes if the new organization has been determined to be "AI appropriate" — meaning it is in an industry or category that warrants inclusion in Thompson's contact management systems. No if the organization is outside the target market and should be ignored. This check is performed when the organization does not already exist in Cupola.
17HODOR Library AssignmentThe HODOR library that this person should be assigned to at their new organization. Since the person has changed organizations, they may no longer be in the same industry as before, so library assignment may differ from the original. Currently marked as TBD in many cases pending manual review. Empty if not determined.
18Multipub Sales RequestYes if the person's new contact information was provided to the sales team for Multipub follow-up (e.g., to transfer subscriptions to their new organization). No otherwise.

15. output_document_inactive_new_org.json#

Purpose#

JSON companion to the inactive-at-new-org CSV. Contains the same data in structured format.

Generation Conditions#

Generated alongside the CSV when inactive-at-new-org records exist.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

FieldTypeDescription
list_namestringAlways "List of Inactive People at New Organization".
purposestringAlways "Track where inactive people went and determine if they should be included in our systems".
generated_atstring (ISO 8601)Timestamp when this file was generated.
record_countintegerNumber of records.
recordsarray of objectsFull serialization of InactiveNewOrgRecord Pydantic models. All fields match the CSV columns using snake_case naming.

16. output_document_undeliverables.csv#

Purpose#

A business deliverable listing all emails classified as undeliverable — bounce-backs, invalid email addresses, and mail delivery failures. These records represent email addresses that are no longer valid and need to be removed or suppressed across backend systems to maintain data hygiene.

Generation Conditions#

Generated only when at least one undeliverable record exists in the output document collector.

Format#

CSV with UTF-8 BOM encoding. All fields quoted.

Column Reference#

#Column HeaderDescription
1IdRecord identifier for tracking this undeliverable record.
2AccountNameThe inbox/account source (business line email) the bounce-back was received at.
3Sender EmailThe sender email address from the bounce notification. This is typically the mail server or postmaster address, not the intended recipient.
4Lookup EmailThe email address that was looked up in backend systems. This is the address that actually bounced — the intended recipient whose email is no longer valid.
5Org NameThe organization name associated with the undeliverable email, if the contact was found in any backend system. Empty if not found.
6Person NameThe name of the person associated with the undeliverable email, if found. Empty if not found.
7SubjectThe subject line of the bounce-back email. Often contains the original subject or a delivery failure message.
8CUPOLA Org IDCupola organization ID for the undeliverable contact, if found. Empty if not in Cupola.
9CUPOLA Person IDCupola person ID for the undeliverable contact, if found. Empty if not in Cupola.
10CUPOLA Org Person IDsComma-separated list of Cupola org-person IDs associated with this undeliverable email. Empty if not in Cupola.
11HODOR ProsNumsComma-separated list of Hodor prospect numbers for this contact. Empty if not in Hodor.
12Multipub SubsnumMultipub subscriber number, if found. Empty if not in Multipub.
13Multipub Sales RequestYes when catalog N02 sales follow-up was queued because Multipub validation showed active, recently expired, or recent single-issue activity. No otherwise. Backend writes remain blocked for bounce-pending undeliverables.
14Multipub Active SubscriptionsSerialized active Multipub orders when validation ran (same shape as inactive-people deliverable). Empty if none.
15Multipub Recent OrdersRecently expired or single-issue orders within the validation window. Empty if none.
16StatusThe processing status for this undeliverable record (e.g. bounce_pending_rule, skipped_no_contact).
17Skip ReasonReason if the undeliverable was not fully processed. Empty if processed normally.

17. output_document_undeliverables.json#

Purpose#

JSON companion to the undeliverables CSV. Contains the same data in structured format.

Generation Conditions#

Generated alongside the CSV when undeliverable records exist.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

FieldTypeDescription
list_namestringAlways "List of Undeliverables".
purposestringAlways "Bounce-backs and invalid email addresses for follow-up and removal from systems".
generated_atstring (ISO 8601)Timestamp when this file was generated.
record_countintegerNumber of undeliverable records.
recordsarray of objectsFull serialization of UndeliverableRecord Pydantic models with snake_case field names.

18. output_document_inactive_no_cupola_match.csv#

Purpose#

Handoff list for every CUPOLA-undetermined case — inactive (or inactive-stage) determinations with no Cupola match, ACTIVE determinations with no Cupola row (no-auto-add policy), and ACTIVE determinations whose matched Cupola row is inactive (reactivation candidates). Used by IP4 / operations for manual Cupola research or record creation. Delivered via notify_sai_action_items (catalog N05/N06) — To: NOTIFICATION_EMAIL_SAI; global Max + Vish Cc.

Generation Conditions#

Generated when at least one InactiveNoCupolaMatchRecord was collected during the run (OutputDocumentCollector.inactive_no_cupola_match).

Format#

CSV with UTF-8 encoding. Column headers follow the same human-readable style as other output_document_* CSVs.

Column Reference#

Headers match output_document_generator.py (generate_inactive_no_cupola_match_csv).

#Column HeaderDescription
1IdRecord identifier
2AccountNameInbox / account source
3Email Received FromSender of the auto-response
4SubjectEmail subject
5Person NamePerson name if inferred or from lookup
6Org NameOrganization name if available
7DeterminationPipeline determination label
8Status with OrgPerson/org status string when set (person_status on the model)
9Multipub DeferredYes / No — inactive path deferred by active Multipub subscription gate
10Multipub Review ReasonMultipub validation text when present
11HODOR ProsNumsComma-separated Hodor prospect numbers if found without Cupola
12Multipub SubsnumSubscriber number if found
13Salesforce IDsComma-separated Salesforce Lead/Contact identifiers if found
14Message IDOriginal message id for traceability

19. output_document_inactive_no_cupola_match.json#

Purpose#

JSON companion to the IP4 no-Cupola handoff CSV.

Generation Conditions#

Generated alongside the CSV when inactive-no-Cupola-match records exist.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

FieldTypeDescription
list_namestring"List of Inactive People with No Cupola Match"
purposestring"IP4 handoff list for CUPOLA-undetermined cases — inactive with no Cupola match, active with no Cupola row, and reactivation candidates (inactive Cupola row on an ACTIVE determination)"
generated_atstring (ISO 8601)When the file was written
record_countintegerNumber of records
recordsarray of objectsInactiveNoCupolaMatchRecord fields in snake_case

20. cupola_audit_log.csv#

Purpose#

A dedicated audit trail for all changes made (or planned) in the CUPOLA contact management system during a run. This file documents every status change (marking contacts active/inactive) and every new contact addition, providing a complete record for compliance, rollback, and operational review purposes.

Generation Conditions#

Always generated every run: CupolaAuditLogger.write_audit_log() writes header-only CSV and an empty entries array in JSON when no Cupola actions were logged.

Format#

CSV with UTF-8 BOM encoding. All fields quoted.

Column Reference#

#Column HeaderDescription
1TimestampThe exact date and time (ISO 8601, Eastern Time) when this CUPOLA action was recorded.
2Action TypeThe type of CUPOLA operation. Values: status_change (an existing contact's active/inactive status was changed), contact_addition (a new contact was added to CUPOLA).
3Contact IDThe CUPOLA org_person_id for the affected contact. For status changes, this is the existing contact ID. For contact additions, this is the newly assigned ID (if available) or empty if the addition was mocked.
4EmailThe email address of the contact being modified or added.
5NameThe name of the contact. Empty if not available.
6Org NameThe organization name associated with the contact. Empty if not available.
7Requested StatusFor status_change entries: Yes if the contact was being set to ACTIVE, No if being set to INACTIVE. Empty for contact_addition entries.
8Previous StatusThe link_org_person.status value captured immediately before the UPDATE via SQL OUTPUT deleted.status. 1 = active, 0 = inactive. Empty for contact additions, recommendation-only rows, or read-only mode where the value cannot be observed.
9Auto AppliedYes when the change was actually executed against CUPOLA via cupola.update_contact_status_with_audit (i.e. CUPOLA_AUTOMATIC_UPDATES=true). No when the audit row records a recommendation only (sent to Venu).
10Update SucceededYes / No when Auto Applied=Yes to record whether the SQL UPDATE returned success. Empty for recommendation-only rows.
11ReasonThe reason for the status change (e.g., Person left company per auto-response, Inactive determination from LLM classification). Empty for contact additions.
12DeterminationThe pipeline determination that triggered this action (e.g., inactive, active, replacement).
13Email SourceThe source email address from the auto-response that initiated this action. This is the original auto-response sender, linking the audit entry back to the triggering email. Empty for contact additions.
14TitleThe job title of the contact. Only populated for contact_addition entries where a title was available. Empty for status changes.

21. cupola_audit_log.json#

Purpose#

JSON companion to the CUPOLA audit CSV. Contains the same data in structured format for programmatic consumption.

Generation Conditions#

Generated alongside the CSV on every run (empty entries when nothing was logged).

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

FieldTypeDescription
generated_atstring (ISO 8601)Timestamp when this file was generated.
entry_countintegerTotal number of audit log entries.
entriesarray of objectsEach object represents one CUPOLA action. Fields match the CSV columns using snake_case keys: timestamp, action_type, contact_id, email, name, org_name, requested_status (boolean for status changes), previous_status (integer 0/1 or null), auto_applied (boolean), update_succeeded (boolean or null), reason, determination, email_source, title. Note: in JSON, boolean fields are true/false/null rather than the Yes/No/empty string used in CSV.

22. cupola_audit_log_rollback_plan.csv#

Purpose#

A revertible record of every CUPOLA status_change that was actually executed (Auto Applied=Yes) during the run. Generated by CupolaAuditLogger._write_rollback_plan (src/auto_responder/utils/cupola_audit_logger.py) so an operator can roll the batch back with simple SQL if a problem is discovered after the fact.

Generation Conditions#

Generated when at least one audit entry has auto_applied=True and update_succeeded is not False. Not written when:

Format#

CSV with UTF-8 BOM encoding. All fields quoted (csv.DictWriter with QUOTE_ALL).

Column Reference#

#Column HeaderDescription
1TimestampWhen the original update was logged (ISO 8601, Eastern Time).
2Contact IDCUPOLA org_person_id that was updated.
3EmailContact email at the time of update.
4NameContact name when known.
5Org NameOrganization name when known.
6Applied StatusThe integer status that was written by the run. 1 if the contact was set ACTIVE, 0 if INACTIVE.
7Previous StatusThe integer status captured immediately before the UPDATE, sourced from OUTPUT deleted.status. The literal string MISSING appears when the previous value could not be observed (read-only wrapper, mock connector, etc.).
8Rollback SQLA single ready-to-run statement that inverts the change, e.g. UPDATE link_org_person SET status = 1 WHERE org_person_id = '<id>';. When Previous Status is MISSING, this column contains a -- MANUAL: previous status unknown comment instead.
9ReasonSame reason text recorded in cupola_audit_log.csv.
10DeterminationPipeline determination that drove the action (e.g. inactive, active).

Operational notes#


23. output_document_multipub_audit.csv#

Purpose#

Per-row Multipub validation audit for every INACTIVE determination that was checked against the Multipub subscription gate. Written to the run directory for engineers; not emailed (Tarun receives notify_tarun_undetermined_sender_review only). After review, Tarun may post files back through POST /multipub/upload (Yesnotify_multipub_subscriber_followup_from_upload to Angel/Yogesh).

Generation Conditions#

Generated when at least one MultipubAuditRecord was collected during the run (OutputDocumentCollector.multipub_audit). Both deferred and non-deferred inactive paths produce a row when Multipub validation runs.

Format#

CSV with UTF-8 BOM encoding. Booleans rendered as Yes / No (via _sanitize_for_csv). All fields quoted.

Column Reference#

Headers come from OutputDocumentGenerator.generate_multipub_audit_csv (src/auto_responder/utils/output_document_generator.py).

#Column HeaderDescription
1IdRecord identifier (8-char UUID slice).
2AccountNameInbox / account source the auto-response landed in.
3Email Received FromEmail address used for the Multipub lookup (post relay normalization).
4Person NamePerson name resolved by the contact lookup; empty if not known.
5Org NameOrganization name resolved by the contact lookup; empty if not known.
6DeterminationDetermination label (e.g. inactive).
7Multipub SubsnumMatched Multipub subscriber number; empty when no Multipub record was found.
8Has Active SubscriptionYes when MultipubValidationResult.has_active_subscription is true.
9Active Order CountNumber of currently-active subscription orders returned by Multipub.
10Has Recently ExpiredYes when at least one recently-expired subscription was found.
11Recently Expired Order CountNumber of recently-expired orders returned.
12Has Recent Single-IssueYes when at least one recent single-issue purchase was found.
13Recent Single-Issue Order CountNumber of recent single-issue orders returned.
14Flagged for ReviewYes when the validation gate flagged the row (typically equals Has Active Subscription OR a review-worthy non-active subscription).
15Inactive Action DeferredYes when the inactive workflow was held back because of an active Multipub subscription. No for clean inactive rows that proceeded.
16Review ReasonFree-text reason from MultipubValidationResult.review_reason. Empty when not flagged.
17SummarySingle-line summary string from MultipubValidationResult.get_summary().
18Message IDOriginal message ID for traceability.

24. output_document_multipub_audit.json#

Purpose#

JSON companion to the Multipub audit CSV — same data, structured for programmatic consumption.

Generation Conditions#

Generated alongside the CSV whenever Multipub audit records exist.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

FieldTypeDescription
list_namestringAlways "Multipub Audit (Tarun handoff)".
purposestringDescribes the deliverable as a per-row Multipub validation audit for INACTIVE determinations.
generated_atstring (ISO 8601)Timestamp when the file was written.
record_countintegerNumber of audit rows.
recordsarray of objectsFull serialization of MultipubAuditRecord Pydantic models with snake_case keys (booleans, not Yes/No).

25. output_document_email_update_requests.csv#

Purpose#

Per-row deliverable for the Changed Email category. Written to the run directory; not bundled into marketing emails (N04 attaches only *_NoLongerThere_*.csv suppression imports from inactive people). Replaces the historical "12Feb-10Mar Email Update Requests" manual export.

Generation Conditions#

Generated by ReportGenerator.write_email_update_requests_deliverable when at least one processed email maps to the Changed Email main category. When zero rows qualify, the file is skipped and a single INFO log line is emitted.

Format#

CSV with UTF-8 BOM encoding. All fields quoted.

Column Reference#

#Column HeaderDescription
1Email IDSource message_id of the auto-response.
2Sender EmailOriginal sender address (post relay normalization).
3Lookup EmailAddress actually used for backend lookup (signature / NDR target / sender, in that priority order).
4Contact FoundYes / No — whether any backend system returned a contact.
5Contact SystemsComma-separated list of systems that matched (e.g. Cupola, Hodor).
6DeterminationPipeline determination label (typically email_update).
7StatusPer-email processing status (success, skipped_*, etc.).
8Org NameOrganization resolved by lookup; empty if not found.
9Person NameResolved person name (falls back to sender name when needed).
10Sender New EmailThe new email address extracted from the auto-response body, if surfaced by the classifier.
11CUPOLA Org IDCupola Org ID when matched.
12CUPOLA Org Person IDCupola org-person link ID when matched.
13CUPOLA Person IDCupola person ID when matched.
14HODOR ProsNumHodor pros-num when matched.
15Multipub SubsNumMultipub subscriber number when matched.

26. output_document_email_update_requests.json#

Purpose#

JSON companion to the email-update-requests CSV.

Generation Conditions#

Generated alongside the CSV when Changed Email rows exist.

Format#

JSON object with UTF-8 encoding, 2-space indentation.

Top-Level Fields#

FieldTypeDescription
list_namestringAlways "Email update requests (Changed Email)".
purposestringAlways "Address corrections for SFMC / marketing systems".
generated_atstring (ISO 8601)Timestamp when the file was written.
record_countintegerNumber of records.
recordsarray of objectsMirror of the CSV columns using snake_case keys (email_id, sender_email, …).

27. action_log.log#

Purpose#

A verbose execution log that tracks every individual operation (database lookups, updates, notifications, LLM calls) in detail, primarily used during dry-run and read-only modes. This file shows exactly what the system would do (or did do) for each email, including mock operations that simulate real actions. It serves as the definitive record of operational intent and is particularly valuable for validating pipeline behavior before switching to live mode.

Generation Conditions#

Generated when the pipeline runs in dry-run mode or read-only mode. Not generated in full live mode. Created at the start of the run.

Format#

Plain text with timestamped entries.

Structure#

================================================================================
DRY-RUN EXECUTION LOG
Started: {ISO 8601 timestamp}
================================================================================

Entry Types#

Each entry is timestamped with [HH:MM:SS] in Eastern Time.

Email Processing Start:

[HH:MM:SS] EMAIL PROCESSING: {email_id} from {sender_email}
[HH:MM:SS]   Subject: {subject (truncated to 100 chars)}

Contact Lookup:

[HH:MM:SS] CONTACT LOOKUP: {email}
[HH:MM:SS]   [MOCK] {System}: Found contact {contact_id}
[HH:MM:SS]   [MOCK] {System}: Not found

LLM Classification:

[HH:MM:SS]   LLM Classification: {category} (confidence: {confidence})
[HH:MM:SS]     Extracted new email: {new_email}
[HH:MM:SS]     Extracted alternate contact: {contact_info}
[HH:MM:SS]     Extracted personal email: {personal_email}

Determination:

[HH:MM:SS]   Determination: {determination} (confidence: {score})

Database Updates (mocked):

[HH:MM:SS]   [MOCK] Would {operation} in {System} for {contact_id} ({key=value, ...})

Notifications (mocked):

[HH:MM:SS]   [MOCK] Would send notification: {type}
[HH:MM:SS]     To: {recipient}
[HH:MM:SS]     Subject: {subject}

Action Execution:

[HH:MM:SS] ACTION EXECUTION: Determination={determination} for {email}

Email Completion:

[HH:MM:SS]   Email processing {SUCCESS|FAILED} for {email}

Summary Section (appended at end of run)#

================================================================================
SUMMARY
================================================================================
Total Emails Processed: {count}

Determinations:
  - {type}: {count}

Database Operations (would be performed):
  - {System}: {count} {operation_type}, {count} {operation_type}

Notifications (would be sent):
  - {type}: {count}

LLM Classification Calls: {count}
Execution Duration: {seconds} seconds
Completed: {ISO 8601 timestamp}
================================================================================

Summary Fields#

FieldDescription
Total Emails ProcessedNumber of emails that went through the full processing pipeline.
DeterminationsBreakdown of determination types and their counts (e.g., inactive: 5, active: 2, unknown: 3).
Database OperationsPer-system breakdown of all database operations that would be performed (in live mode) or were mocked. Grouped by system (Cupola, Hodor, Salesforce, Multipub) with operation counts (e.g., lookups, update_status, add_contact).
NotificationsCount of each notification type that would be sent (e.g., alerts to Max/Client Services about active subscriptions).
LLM Classification CallsTotal number of LLM API calls made during classification.
Execution DurationTotal wall-clock time for the entire run in seconds.

28. batch_report.html#

Purpose#

A self-contained, visually rich HTML dashboard summarizing the entire batch run. Designed for browser viewing and sharing with stakeholders. Features interactive Plotly charts, KPI cards, per-email detail tables, and links to the output document files. This is the most polished and accessible output artifact, suitable for non-technical audiences.

Generation Conditions#

Generated when at least one email is processed.

Format#

Single HTML file with embedded CSS. Uses the Plotly JavaScript library via CDN (https://cdn.plot.ly/plotly-2.27.0.min.js) for interactive charts and Google Fonts (Outfit, IBM Plex Mono, IBM Plex Sans) for typography. Dark theme (slate/charcoal background with sky-blue and teal accents).

Sections#

Run Overview (KPI Cards)#

MetricDescription
ModeThe run mode: DRY-RUN (all connections mocked), READ-ONLY (live reads, writes mocked), or LIVE.
Total EmailsNumber of emails processed in this batch.
DurationTotal run time in seconds.
Action Success RatePercentage of successfully completed actions out of total actions attempted.
SuccessfulCount of emails that completed with success status.
FailedCount of emails with failed status.
Skipped (no contact)Count of emails where contact was not found in any system.
Skipped (unknown)Count of emails with unknown determination (no action needed).
ErrorsCount of emails that encountered unexpected errors.
Deferred (Multipub)Count of emails where inactive marking was halted due to active Multipub subscriptions.
QA CorrectionsNumber of times the QA agent changed the initial LLM classification.
Multipub ValidatedNumber of emails that underwent Multipub subscription validation.
Multipub DeferredNumber of emails deferred due to active Multipub subscriptions (same as Deferred above).

Output Document Counts#

MetricDescription
Inactive PeopleNumber of records in the inactive people output document.
Alternate ContactsNumber of records in the alternate contacts output document.
Inactive at New OrgNumber of records in the inactive-at-new-org output document.

Visual Analysis (Interactive Charts)#

ChartTypeDescription
Determination BreakdownDonut/pie chartDistribution of determination types (INACTIVE, ACTIVE, REPLACEMENT, UNKNOWN, etc.) across all processed emails.
Outcome Status DistributionHorizontal bar chartCount of each processing status (Success, Failed, Skipped No Contact, Skipped Unknown, Error, Deferred Multipub).
LLM Category BreakdownVertical bar chartCount of emails per LLM classification category (undeliverable, left company, retired, deceased, out of office, changed email, N/A).
Actions by SystemStacked bar chartCount of succeeded vs. failed actions per backend system (Cupola, Hodor, Salesforce, Multipub).

In-Depth Analysis (Per-Email Table)#

ColumnDescription
#Sequential row number.
SenderSender's email address (monospaced).
SubjectEmail subject, truncated to 60 characters with ... if longer.
DeterminationDetermination type in uppercase.
ConfidenceConfidence score as percentage.
StatusProcessing status in title case (spaces replace underscores).
ActionsSummary of up to 5 actions in format [OK/FAIL] system: operation. Shows +N more if additional actions exist. Shows if no actions.
ErrorError message text, or if no error.

Provides download links (relative file paths) to the three output document pairs:

Note: Undeliverables are generated as separate files but are not linked from the HTML report.


29. batch_report.pptx#

Purpose#

A PowerPoint presentation summarizing the batch run for executive review or team meetings. Contains approximately 10 slides covering KPIs, determination breakdowns, outcome status, LLM category analysis, actions by system, confidence and quality metrics, per-email summary tables, and output document counts.

Generation Conditions#

Generated alongside batch_report.html when at least one email is processed.

Format#

PowerPoint .pptx file generated using the python-pptx library.

Slides#

SlideContent
Title SlideReport title with generation date and run window.
Executive Summary KPIsTotal emails, duration, action success rate, key outcome counts.
Determination BreakdownChart and counts of each determination type.
Outcome StatusDistribution of processing statuses.
LLM Category AnalysisBreakdown of LLM classification categories.
Actions by SystemSuccess/failure counts per backend system.
Confidence & QualityQA correction rate, average confidence, Multipub validation stats.
Per-Email SummaryTable(s) listing each email with sender, subject, determination, status.
Output DocumentsCounts and summaries for the three output document lists (inactive people, alternate contacts, inactive at new org).

30. output_document_human_review.csv / .json#

Purpose#

Consolidated Human Review digest introduced by the active-only automation policy. Captures every row that the pipeline refused to act on automatically so IP4 / operations can triage manually. Written by OutputDocumentCollector.add_human_review. Actionable rows ride in notify_sai_action_items; metadata (counts + reason legend) is included in notify_venu_cupola_audit_files.

Generation Conditions#

Generated whenever OutputDocumentCollector.human_review is non-empty. Rows are added by ActionEngine from several handlers:

reason constantWhen
HUMAN_REVIEW_REASON_ACTIVE_NEW_CONTACTACTIVE outcome but no CUPOLA row — no auto-add.
HUMAN_REVIEW_REASON_REACTIVATION_CANDIDATEACTIVE outcome but matched CUPOLA row is inactive — no auto-reactivate.
HUMAN_REVIEW_REASON_UPDATE_ON_INACTIVEEMAIL_UPDATE / TITLE_UPDATE on inactive CUPOLA row (active-only gate blocked it).
HUMAN_REVIEW_REASON_OUT_OF_OFFICEOUT_OF_OFFICE determination — tracked separately, no system writes.
Existing reasons (UNKNOWN, bounce triage, replacement parse fallback, etc.)Already collected from previous phases.

Format#

CSV with UTF-8 BOM encoding; JSON with 2-space indentation. All fields quoted.

Column Reference (CSV)#

Headers come from output_document_generator.py (generate_human_review_csv). Column titles use spaced words (e.g. Sender Email, Lookup Email).

ColumnDescription
IDRecord identifier (8-char UUID slice).
Account NameInbox / account source.
Message IDOriginal message id for traceability.
Sender EmailSender of the auto-response.
Lookup EmailEmail used after normalization for contact lookup.
SubjectEmail subject.
Email BodyFull raw body of the source email (plain text or HTML as stored).
ReasonOne of the HUMAN_REVIEW_REASON_* constants listed above.
Reason DetailHuman-readable explanation of why the pipeline deferred.
DeterminationDetermination label at the time of routing.
LLM CategoryNormalized classifier category when available.
ConfidenceLLM confidence when available.
Person Name / Org NameWhen available.
CUPOLA OrgPerson IDs / HODOR ProsNums / Multipub Subsnum / Salesforce IDsResolved identifiers when known.
Suggested ActionRecommended next step for the reviewer.
NotesFree-form pipeline notes.

JSON Top-Level Fields#

FieldTypeDescription
list_namestring"Human Review digest"
purposestringDescribes the file as the consolidated human-review queue.
generated_atstring (ISO 8601)Timestamp.
record_countintegerNumber of rows.
recordsarrayFull serialization of each review record.

31. impact_report.txt / .json#

Purpose#

Per-run headline summary introduced by the active-only automation policy. Produced by utils/impact_report.py and attached inline to Notifier.notify_run_audit_for_ip4 (Sai-only run audit).

Generation Conditions#

Always written at end of run (after the CUPOLA audit logger finishes flushing). The counts are derived from the in-memory CupolaAuditLogger.entries list, so read-only mode and dry-run runs still emit the report (counts are zero when no writes occurred).

Format#

Fields#

FieldTypeDescription
emails_processedintegerTotal auto-response emails handled in the run.
records_deactivatedintegerCUPOLA rows flipped to inactive — status_change audit entries with requested_status=False and auto_applied=True.
records_addedintegerNew CUPOLA rows inserted — contact_addition audit entries with a non-empty contact_id. Only ticks for REPLACEMENT when CUPOLA_AUTO_ADD_REPLACEMENTS=true.
generated_atstring (ISO 8601)Timestamp the report was written (JSON only).

32. action_items_tracker.csv (cross-run)#

Purpose#

Central queue of one row per action notification email sent in a run (N01–N07). Appended once per run by append_action_items_for_run after all notifications complete. Each row includes ActionItemCount and a per-attachment breakdown in Summary (artifact line counts, not separate tracker rows). Default path: {REPORT_OUTPUT_DIR}/action_items_tracker.csv; override with ACTION_ITEMS_TRACKER_PATH. Post-run completion requests use catalog N12 via auto-responder-request-action-item-confirmation (not appended to this CSV); N12 bodies use collect_action_item_detail_rows for the same counts.

Generation Conditions#

Skipped when the run folder RunId already exists in the tracker (idempotent re-run/resend). New rows are appended with Completed=false for manual spreadsheet triage.

Columns#

ColumnDescription
CompletedFirst column for spreadsheet triage. false on append; operators set true when the notification owner confirms work.
NotificationToConfigured SMTP To recipient(s) for that catalog notification (comma-separated when multiple, e.g. N02 Angel + Yogesh).
RunIdRun folder name (e.g. run_2026-05-26_14-30-00).
RunTimestampParsed from folder name when possible.
NotificationIdN01–N07 (N05 vs N06 follows Sai bundle logic).
ActionItemCountNumber of actionable lines in attached CSVs for that notification.
SourceFilesSemicolon-separated list of run artifacts that contributed rows.
SummaryPer-file counts (e.g. output_document_alternate_contacts.csv: 67; …).
CompletedAt / NotesEmpty on append; manual follow-up.

Glossary of Systems#

SystemFull NameDescription
CUPOLACUPOLA Contact ManagementThompson's primary contact and organization management system. Stores person records, organization records, and org-person links. The system of record for contact active/inactive status.
HODORHodor / dmorders_thompsonThompson's prospect/subscriber database. Contains prospect numbers (ProsNum), email records, and subscription metadata. Contacts can be marked "No Longer with Firm" when inactive.
SFMCSalesforce Marketing CloudEmail marketing platform. The Auto Suppression List prevents marketing emails from being sent to inactive/invalid addresses.
MultipubMultiPub Subscription ManagementPublication subscription and order management system. Tracks active subscriptions, expired orders, and single-issue purchases. Used to validate whether an inactive person still has live subscription activity before marking them inactive.
SalesforceSalesforce CRMCustomer relationship management system. Contains Lead and Contact records. Updated when contact status changes (if not related to Multipub).

Glossary of Determination Types#

DeterminationDescription
inactivePerson has permanently left the organization (left company, retired, or deceased). All contact records across systems should be marked inactive/suppressed.
activePerson is confirmed active at their organization. When no CUPOLA row exists or the matched row is inactive, the pipeline refuses to auto-add / auto-reactivate and routes to Human Review; mirror systems (Hodor, non-Multipub Salesforce) are still updated as before.
replacementA replacement/alternate contact was identified. The original person is marked inactive and the replacement row is captured for IP4 review (auto-add disabled unless CUPOLA_AUTO_ADD_REPLACEMENTS=true).
title_updatePerson's job title has changed. Gated on active CUPOLA row — when the gate fails the entire update is blocked and routed to Human Review.
email_updatePerson's email address has changed. Gated on active CUPOLA row — when the gate fails the entire update is blocked and routed to Human Review.
out_of_officeAuto-reply is a temporary absence notification. Promoted to a first-class determination by the performs no system writes and emits a Human Review row with HUMAN_REVIEW_REASON_OUT_OF_OFFICE.
unknownEmail is not relevant (spam, unrelated content) or cannot be classified. No action is taken.

Glossary of Processing Statuses#

StatusDescription
successAll planned actions completed successfully.
failedOne or more actions failed during execution.
skipped_no_contactContact was not found in any backend system — no actions could be taken.
skipped_unknownDetermination was unknown — no actions were needed.
errorAn unexpected error occurred during processing (e.g., network failure, unhandled exception).
deferred_multipubInactive marking was halted because the person has active subscriptions in Multipub. Requires manual review.
pendingProcessing has not yet completed. Should not appear in final reports.

Maintaining this document#

Edit docs/DATA_DICTIONARY.html directly. Preview locally from the repo root:

bash
python scripts/serve_data_dictionary.py

Then open http://127.0.0.1:8765/DATA_DICTIONARY.html in a browser.