AutoResponder Determination System - Complete Reference#

This document is the single, authoritative reference for the AutoResponderProcess determination system. It covers the end-to-end flow from email extraction through LLM classification to action execution, including all business rules, system update logic, Multipub validation, and implementation details.

Table of Contents#

Executive Summary
Architecture Overview
Complete Process Flow
LLM Classification Pipeline
Category-to-Determination Mapping
Determination Types — Detailed Reference
Multipub Validation (Tarun's Process)
System Update Rules
Run Outputs (Spreadsheets & Logs)
Notifications
Special Cases & Edge Cases
Configuration
Dry-Run Mode
Project Structure
Integration History
Error Handling
Performance Considerations
Testing
Implementation Status & Enhancements
Known Limitations & Risks
Team Responsibilities

Executive Summary#

The AutoResponderProcess is a production-ready system that processes auto-response emails from a central MSSQL repository. It uses an LLM-based classification pipeline (OpenAI or Ollama) to determine the status of each contact, then executes appropriate actions across four database systems: Cupola, Hodor, Multipub, and Salesforce.

Key Principles:

Multipub and Salesforce (when the contact is related to Multipub) are treated as read-only for status and email updates — the marketing team handles those changes.
Only Cupola and Hodor are updated directly (unless HODOR_AUTOMATIC_UPDATES=false, in which case Hodor changes are emailed as a manual-action list instead of SQL UPDATEs).
Before marking contacts inactive, the system runs Tarun's Multipub validation process to cross-check against active/recent subscriptions.
The system supports a comprehensive dry-run mode for safe testing.

Determination Types (6):

Determination	Primary Action
UNKNOWN	Skip processing
INACTIVE	Mark inactive, check subscriptions, notify
REPLACEMENT	Mark original inactive, add replacement to Cupola, notify
EMAIL_UPDATE	Changed Email (no alt) — update email/title in Cupola/Hodor/SF (skip Multipub/SF-if-Multipub)
TITLE_UPDATE	Same handler as `EMAIL_UPDATE`; not emitted by `map_category_to_determination()` (reserved for tests / future use)
ACTIVE	Ensure active status in Cupola/Hodor (skip Multipub/SF-if-Multipub)

Architecture Overview#

Original Email Classifier#

Location: Auto_responder_email/unified_email_classifier.py (reference only — functionality fully integrated into main application)

The original program was a FastAPI-based service with a /classify/csv/ endpoint. Its LLM classification and QA agent logic has been extracted and integrated into the main application's determination services.

Current Application#

Location: src/auto_responder/

Workflow: Extract → Deduplicate → LLM determination (Step 4a) → resolve lookup email → contact lookup (Step 4c; stub in dry-run) → execute actions (Step 4d) → notify

(main.py runs classification before contact lookup so UNKNOWN emails skip database lookups.)

Key Features:

Extracts emails from central MSSQL repository
Classifies each email with the LLM before contact lookup (so UNKNOWN skips DB queries)
Looks up contacts across Cupola, Hodor, Multipub, and Salesforce (with graceful degradation)
LLM-based determination with two-agent QA system (Classification Agent + QA Agent)
Action engine that routes determinations to appropriate handlers
Notification system for subscriptions, replacements, and Multipub validation flags
Comprehensive dry-run mode

Application Determination Types (6)#

ACTIVE — Person is active at organization
INACTIVE — Person is no longer at organization
REPLACEMENT — Person has been replaced by someone else
EMAIL_UPDATE — Changed Email (no alternate) — only determination the mapper produces for “changed contact info” today; _handle_title_update()
TITLE_UPDATE — Same handler as EMAIL_UPDATE in ActionEngine; the live LLM path does not set this value (mapper never returns it). Use for fixtures/tests or future wiring.
UNKNOWN — Cannot determine action type

Complete Process Flow#

PlantUML Diagram#

Source file: diagrams/business_flow_technical.puml. Pre-rendered exports sit alongside the source: business_flow_technical.svg and business_flow_technical.pdf. Re-render any of the .puml files with python docs/diagrams/_render_business_flow.py [stem] or with any PlantUML-compatible tool. The technical diagram calls out class / method entry points (EmailProcessor.deduplicate_emails(), AutoResponseDetermination.determine(), ContactLookup.lookup_with_enrichment(), ActionEngine.execute() and its _handle_* branches), the run artifacts written at each step, and the notify_* dispatches used for routing. The business-level, non-engineering view lives in diagrams/business_flow_simplified.puml — that is the file to share externally. The two diagrams are kept in sync (business_flow_simplified.puml / business_flow_technical.puml).

Processing Steps#

Step	Description	Key Classes / notes
Step 1	Extract emails from central MSSQL repository	`EmailRepository`
Step 2	Deduplicate emails by Message ID	`EmailProcessor`
Step 3	Initialize services (with graceful degradation)	`ContactLookup`, `AutoResponseDetermination`, `ActionEngine`, `Notifier`
Step 4a	LLM determination (classification, QA, map to `Determination`)	`AutoResponseDetermination.determine()`
—	If determination is UNKNOWN, skip (no contact lookup, no actions)	`main.py`
Step 4c	Contact lookup using resolved `lookup_email` (dry-run: stub `Contact`, no DB lookup)	`ContactLookup.lookup_with_enrichment()`
Step 4d	Execute actions based on determination	`ActionEngine.execute()`

Data Flow Summary#

Email Repository (MSSQL - CBI_EmailExtraction.dbo.Emails)
    ↓
Step 1: Email Extraction (most recent N emails)
    ↓
Step 2: Deduplication (by Message ID)
    ↓
Step 3: Service Initialization
    ├─ ContactLookup (Cupola, Hodor, Multipub, Salesforce)
    │  (graceful degradation — continues if some connectors fail)
    ├─ AutoResponseDetermination (LLM: OpenAI or Ollama)
    ├─ ActionEngine (with all DB connectors + Notifier)
    └─ [Dry-Run: mock Cupola/Multipub/Salesforce + real HodorDatabase; optional mock email repo; ActionLogger + StageLogger]
    ↓
Step 4: Process Each Email
    ↓
┌───────────────────────────────────────────────────┐
│ Step 4a: DETERMINATION (LLM pipeline)               │
│                                                     │
│ 1. Classification Agent → Category + Contact Info │
│ 2. QA Agent → Verification + Corrections            │
│ 3. Email Validation → Clean addresses               │
│ 4. Normalize compound categories → single label     │
│ 5. Category Mapper → Determination enum             │
│ 6. Identify auto-response source email             │
│ 7. Post-Processing → replacement email ≠ sender   │
└───────────────────────────────────────────────────┘
    ↓
    UNKNOWN? → skip (no lookup, no actions)
    ↓
┌───────────────────────────────────────────────────┐
│ Step 4c: Contact Lookup                            │
│ Search: Cupola → Hodor → Multipub → Salesforce     │
│ (graceful degradation; dry-run: stub Contact)      │
│ Returns: Unified Contact with records from all     │
└───────────────────────────────────────────────────┘
    ↓
Step 4d: Action Engine Execution
    ├─→ UNKNOWN → Skip
    │
    ├─→ INACTIVE → ┌──────────────────────────────────────────┐
    │              │ MULTIPUB VALIDATION (Tarun's Process)     │
    │              │ 1. Match by email/user_id or name+org    │
    │              │ 2. Check active subscriptions            │
    │              │ 3. Check recently expired (12 mo)        │
    │              │ 4. Check single-issue purchases (12 mo)  │
    │              │ 5. Flag for review if subscriptions found│
    │              └──────────────────────────────────────────┘
    │              → Mark inactive (Cupola, Hodor only)
    │              → Skip Multipub & Salesforce-if-Multipub
    │              → Check subscriptions, notify if found
    │
    ├─→ REPLACEMENT → Mark original inactive (same logic)
    │                → Add replacement to Cupola
    │                → Notify Sai Teja
    │
    ├─→ EMAIL_UPDATE / TITLE_UPDATE → Update email/title in Cupola/Hodor/SF
    │                → Skip Multipub & Salesforce-if-Multipub
    │
    └─→ ACTIVE → Ensure active (Cupola/Hodor)
               → Add to Cupola if not present
               → Skip Multipub & Salesforce-if-Multipub
    ↓
Notifications (if needed)
    ├─→ Multipub validation flag notification → Client Services
    ├─→ Person left with subscription notification → Max / Client Services
    └─→ New replacement notification → Sai Teja
    ↓
Next Email / Summary Report

LLM Classification Pipeline#

Location in Code: src/auto_responder/main.py → Step 4a
Service: AutoResponseDetermination.determine()

Pipeline Steps#

Classification Agent (LLM)
- Analyzes email subject and body (truncated to 12,000 chars)
- Extracts contact information (sender_new_email, alternate_contact, retired_personal_email)
- Assigns category based on priority order
- Returns initial classification result with confidence level
QA Agent (LLM)
- Verifies classification agent's output
- Checks category correctness and priority order
- Validates contact information extraction
- Applies corrections if needed
- Returns QA-verified result
Email Address Validation
- Validates sender_new_email and retired_personal_email using regex
- Cleans and strips whitespace
Normalize LLM category (normalize_llm_category)
- Models sometimes return compound strings (e.g. "Out of Office, Changed Email").
- The pipeline collapses these to one canonical label using pattern priority in category_mapper.py (highest business severity first: Undeliverable → Deceased → Retired → Left Company → Changed Email → Out of Office → …).
- That single label drives map_category_to_determination and the person/email status columns.
Category Mapping
- Maps normalized LLM category to Determination enum
- Parses alternate_contact into ReplacementInfo
- If alternate_contact parsing fails for Retired/Deceased/Left Company → falls back to INACTIVE
- Creates DeterminationResult (does not set updated_title or TITLE_UPDATE today — see EMAIL_UPDATE / TITLE_UPDATE under Determination Types — Detailed Reference)
Source email identification (AutoResponseDetermination Step 5 in code)
- identify_auto_response_source_email() sets source_email on the result; main.py prefers it over the raw sender when resolving lookup_email.
- After Step 1b relay normalization (normalize_relay_email, optional multi-label split when the body is available), reconcile_email_with_body() corrects mangled person addresses (e.g. colleen.bivona.domail@maricopa.edu → signature colleen.bivona@domail.maricopa.edu) at ingest, during source identification (Step 3b before the cleaned-sender fallback), and again when resolving lookup_email.
Post-Processing (alternate contacts vs sender)
- Removes replacement entries whose replacement_email equals the sender
- If no replacement entries remain and determination was REPLACEMENT:
  - Final category Retired, Deceased, or Left Company → change determination to INACTIVE
  - Otherwise (e.g. Changed Email with unusable alt) → change determination to UNKNOWN

Contact Information Fields#

Field	Source	Usage	Updates Systems?
`sender_new_email`	LLM extraction	Update email in systems	Yes
`alternate_contact`	LLM extraction	Create replacement contact	Yes (Cupola)
`retired_personal_email`	LLM extraction	Store in notes only	No
`is_long_term_leave`	LLM flag	Log for review	No
`new_org_details` (`new_org_*` keys)	LLM QA JSON (nested object or flat keys); mapper fallback via `_enrich_new_org_details`	`output_document_inactive_new_org` when person moved	No (IP4 research list)
`updated_title`	Not populated by LLM pipeline	Would drive `update_contact_title()` if set	Yes (if ever wired)

QA checklist (move / retire categories): Prefer a structured new_org_details object in QA JSON (new_org_name, new_org_email, etc.). When only final_sender_new_email is present, category_mapper._enrich_new_org_details still populates new_org_email for Left Company, Retired, Deceased, and Changed Email so output_document_inactive_new_org is not empty.

Category-to-Determination Mapping#

LLM Category	Person Status	Email Status	Determination	Special Handling
Not Relevant	Inconclusive	Unknown	UNKNOWN	Skip
Flagged as Spam	Inconclusive	Unknown	UNKNOWN	Skip
Undeliverable	Inconclusive	Invalid	INACTIVE	Mark email invalid
Retired (no alt)	Inactive	Invalid	INACTIVE	Check subscriptions
Retired (with alt)	Inactive	Invalid	REPLACEMENT	Mark original inactive + add replacement
Out of Office	Active	Valid	OUT_OF_OFFICE	First-class determination. Routed to Human Review digest (`HUMAN_REVIEW_REASON_OUT_OF_OFFICE`); no CUPOLA / Hodor / Multipub / SF writes — dedicated OOO extraction process will pick these up.
Deceased (no alt)	Inactive	Invalid	INACTIVE	Check subscriptions
Deceased (with alt)	Inactive	Invalid	REPLACEMENT	Mark original inactive + add replacement
Left Company (no alt)	Inactive	Invalid	INACTIVE	Check subscriptions
Left Company (with alt)	Inactive	Invalid	REPLACEMENT	Mark original inactive + add replacement
Changed Email (no alt)	Active	Invalid	EMAIL_UPDATE	Update email/title in Cupola/Hodor/SF (skip Multipub/SF-if-Multipub)
Changed Email (with alt)	Active	Invalid	REPLACEMENT	Add replacement
Unknown	Inconclusive	Unknown	UNKNOWN	Skip

Compound categories: If the LLM returns multiple labels in one string, normalize_llm_category picks a single row using the severity order above — not necessarily the first phrase in the string.

Fallback: If alternate_contact is present for Retired/Deceased/Left Company but parsing fails, the determination falls back to INACTIVE.

Determination Types — Detailed Reference#

1. UNKNOWN#

Source Categories: Not Relevant, Flagged as Spam, Unknown

Business Logic: These emails are not actionable auto-responses. No database updates or notifications are required.

Process:

┌─────────────────────────────────┐
│ Skip Processing                 │
│ - Log warning                   │
│ - No database updates           │
│ - Continue to next email        │
└─────────────────────────────────┘

Spreadsheet landing (every UNKNOWN row):

Processing Report (master) — processing_report_master.csv/.json — every email lands here with status=skipped_unknown.
Human Review — output_document_human_review.csv/.json — a person picks the real outcome later. See Run Outputs.

Code Location: action_engine.py → execute() → early return for UNKNOWN; main.py also calls OutputDocumentCollector.add_human_review() before the skip so UNKNOWN never "disappears" from the per-run ledger.

2. INACTIVE#

Source Categories: Undeliverable, Retired (without alt), Deceased (without alt), Left Company (without alt)

Business Rules:

Run Multipub validation before marking inactive (Tarun's process)
Mark contact inactive in Cupola and Hodor
Skip Multipub (subscription data only, marketing handles contact changes)
Skip Salesforce if contact is related to a Multipub subscriber
Check for active subscriptions across all systems
Notify Max/Client Services if active subscription exists
Store retired_personal_email in notes (reference only, not updated in any system)

Process:

┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 1: MULTIPUB VALIDATION (Tarun's Cross-Check Process)                   │
│    Before marking inactive, validate against Multipub:                      │
│                                                                             │
│    a. Match contact by:                                                     │
│       • Email address / User ID (PRIMARY)                                   │
│       • First Name + Last Name + Organization (SECONDARY - with caution)    │
│                                                                             │
│    b. Check subscription scenarios:                                         │
│       • Active subscriptions (multi-issue, eiss > today)                    │
│       • Subscriptions expired within last 12 months                         │
│       • Single-issue purchases within last 12 months                        │
│                                                                             │
│    c. If relevant subscriptions found:                                      │
│       → Flag for review                                                     │
│       → notify_multipub_subscriber_followup_batch() to Angel + Yogesh       │
│       → Proceed with marking inactive (after notification)                  │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│ STEP 2: Update IPv4 Team Systems (Cupola and Hodor)                         │
│    ├─ Cupola: update_contact_status(False) + check_active_subscription()    │
│    └─ Hodor: update_contact_status(False) + check_active_subscription()     │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│ STEP 3: ACE Team Systems (Read-Only Approach)                               │
│    ├─ Multipub: **SKIP status update**                                      │
│    │  (Multipub is subscription data only;                                  │
│    │   marketing team handles contact changes)                              │
│    │  Still check_active_subscription() for consistency                     │
│    │                                                                        │
│    └─ Salesforce:                                                           │
│       ├─ If contact IS in Multipub → **SKIP** Contact update               │
│       │  (Contact is related to Multipub subscriber)                        │
│       └─ If contact NOT in Multipub:                                        │
│          → update_contact_status(False)                                     │
│          → check_active_subscription()                                      │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│ STEP 4: Notifications                                                       │
│    └─ If Active Subscription Found in any system:                           │
│       → notify_multipub_subscriber_followup_batch (catalog N02) │
│       → To: Max / Client Services                                           │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│ STEP 5: Store Additional Information                                        │
│    └─ Store retired_personal_email in notes (if provided, for reference)    │
│       (NOT updated in any system — logged only)                             │
└─────────────────────────────────────────────────────────────────────────────┘

Code Location: action_engine.py → _handle_inactive() and _validate_multipub_before_inactive()

3. REPLACEMENT#

Source Categories:

Changed Email (with alternate_contact present)
Retired (with alternate_contact present)
Deceased (with alternate_contact present)
Left Company (with alternate_contact present)

Any INACTIVE category (Retired, Deceased, Left Company) with an alternate_contact present is automatically mapped to REPLACEMENT. If the alternate_contact string cannot be parsed, it falls back to INACTIVE.

Business Rules:

Mark original person inactive (full INACTIVE flow including Multipub validation)
Extract replacement contact info (name, email, title) from alternate_contact
Add replacement person to Cupola only (IPv4 team system)
Notify Sai Teja about the new replacement

Process:

┌─────────────────────────────────────────────────────┐
│ 1. Validate replacement_info                        │
│    └─ If None → Log warning, return False           │
│                                                      │
│ 2. Mark Original Person INACTIVE                    │
│    └─ Calls _handle_inactive() for original         │
│       (Includes full Multipub validation process)   │
│                                                      │
│ 3. Use replacement_info from DeterminationResult    │
│    └─ Contains: name, email, title                  │
│       Parsed from alternate_contact string           │
│       Format: "Name: John, Email: john@abc.com"      │
│                                                      │
│ 4. Add Replacement to Cupola                        │
│    └─ check_contact_exists → add_contact() or      │
│       reuse existing org_person_id (see connections/cupola.html)  │
│       (IPv4 team system only)                       │
│                                                      │
│ 5. Notify Sai Teja                                  │
│    └─ notify_sai_action_items() (catalog N05/N06)       │
│       About: new replacement person                 │
└─────────────────────────────────────────────────────┘

Code Location: action_engine.py → _handle_replacement()

4. EMAIL_UPDATE / TITLE_UPDATE#

Source Categories: Changed Email (without alternate_contact, and not normalized to another label) maps to EMAIL_UPDATE in category_mapper.py.
TITLE_UPDATE: The enum and ActionEngine branch exist and share _handle_title_update() with EMAIL_UPDATE, but map_category_to_determination() never returns TITLE_UPDATE, and create_determination_result() does not set updated_title. Production runs therefore only reach this handler as EMAIL_UPDATE (with sender_new_email from the LLM). Title-only updates via updated_title would require future LLM/schema wiring or manual construction of DeterminationResult.

Business Rules:

Update email in Cupola, Hodor, and Salesforce (with the same Multipub / Salesforce-if-Multipub skip rules as below)
Skip Multipub (marketing handles contact info changes)
Skip Salesforce if contact is related to a Multipub subscriber
Log email mapping (old → new)
When updated_title is present, call update_contact_title() per system with the same skip rules as email

Process:

┌─────────────────────────────────────────────────────────────┐
│ 1. Check for sender_new_email                               │
│                                                              │
│ 2. If sender_new_email Present:                              │
│    └─ Update Email in systems (with skip logic):            │
│       ├─ Multipub → **SKIP**                                │
│       │  (Marketing team handles contact info changes)      │
│       ├─ Salesforce (if in Multipub) → **SKIP**             │
│       │  (Contact is related to Multipub subscriber)        │
│       ├─ Cupola → update_contact_email()                    │
│       ├─ Hodor → update_contact_email()                     │
│       └─ Salesforce (if NOT in Multipub)                    │
│          → update_contact_email()                           │
│                                                              │
│ 3. Log Email Mapping                                        │
│    └─ old_email → new_email                                  │
│                                                              │
│ 4. If updated_title Present:                                │
│    └─ Same skip rules as email; then update_contact_title()  │
│       on Cupola / Hodor / Salesforce where applicable       │
└─────────────────────────────────────────────────────────────┘

Code Location: action_engine.py → _handle_title_update()

5. ACTIVE#

Source categories: Left without a clear OOO signal — e.g. emails that still resolve to "active person, no change needed" after QA. OOO is now tracked separately as OUT_OF_OFFICE (see §6).

Business Rules (Active-only policy):

Never auto-add new CUPOLA rows. If the lookup finds no CUPOLA record, the row is written to output_document_inactive_no_cupola_match.* and a Human Review row is added with reason HUMAN_REVIEW_REASON_ACTIVE_NEW_CONTACT — IP4 verifies + creates a new entry manually.
Never auto-reactivate inactive CUPOLA rows. Inactive rows for an ACTIVE outcome are routed to Human Review with reason HUMAN_REVIEW_REASON_REACTIVATION_CANDIDATE; IP4 verifies and creates a new CUPOLA entry (do not reactivate the old one).
Already-active CUPOLA rows: no change required (logged as "Already active").
Skip Multipub (subscription data only).
Skip Salesforce if contact is related to a Multipub subscriber.

Process:

┌─────────────────────────────────────────────────────────────┐
│ 1. Lookup CUPOLA rows for contact                           │
│    ├─ No rows                → add_inactive_no_cupola_match │
│    │                            + Human Review              │
│    │                            (ACTIVE_NEW_CONTACT)        │
│    └─ Rows found → for each row:                            │
│        ├─ active  → "Already active" (no-op)                │
│        └─ inactive → Human Review                           │
│                     (REACTIVATION_CANDIDATE)                │
│                                                              │
│ 2. All CUPOLA state changes require human verification;     │
│    no update_contact_status / add_contact calls fire on     │
│    this path.                                               │
└─────────────────────────────────────────────────────────────┘

Code Location: action_engine.py → _handle_active()

6. OUT_OF_OFFICE#

Source Categories: Out of Office, Out of Office (Long-term Leave: is_long_term_leave=True)

Business Rules: OOO replies signal nothing actionable for CUPOLA / Hodor / Multipub / Salesforce on the current run. They are surfaced for the planned dedicated out-of-office extraction process.

Process:

┌─────────────────────────────────────────────────────────────┐
│ 1. Add row to Human Review digest                           │
│    reason = HUMAN_REVIEW_REASON_OUT_OF_OFFICE               │
│                                                              │
│ 2. NO writes to any system. NO status updates.              │
└─────────────────────────────────────────────────────────────┘

Code Location: action_engine.py → _handle_out_of_office()

Multipub Validation (Tarun's Process)#

Purpose#

Ensure data accuracy across systems by verifying that inactive contacts from the Autoresponse process do not have active or recently relevant subscriptions in Multipub before marking them as inactive.

Applies to: INACTIVE determination (and the INACTIVE step within REPLACEMENT). Runs before marking inactive.

Matching Criteria#

Priority	Method	Description
PRIMARY	Email / User ID	Direct match on email address or user ID
SECONDARY	Name + Org	First Name + Last Name + Organization (use with caution, only if exactly one match found)

Subscription Scenarios Checked#

Scenario	Criteria	SQL Details
Active Subscriptions	Multi-issue orders with future end date	`o.stat IN ('01','02','03','04','05','30') AND o.eiss > GETDATE() AND p.SM='M'`
Recently Expired	Expired within last 12 months	`o.eiss BETWEEN DATEADD(YEAR,-1,GETDATE()) AND GETDATE() AND p.SM='M'`
Single-Issue Purchases	Purchased within last 12 months	`o.odte > DATEADD(YEAR,-1,GETDATE()) AND p.SM='S'`

Active order queries include both direct orders and site license orders (via pub.site, pub.SITEORD, pub.SITEUSER joins, excluding IP-authenticated site licenses).

Important: The focus is on subscribers (contacts), not order-level criteria. Factors such as paid/unpaid status, comp, trial, or site license orders are NOT used to exclude contacts from the review.

Validation Result Actions#

No subscriptions found → Proceed with inactive marking
Relevant subscriptions found → Flag for review, notify Client Services, then proceed with inactive marking (with audit trail)

Validation Flow#

Contact marked as INACTIVE by LLM
    ↓
_validate_multipub_before_inactive()
    ↓
┌─────────────────────────────────────────────────────────┐
│ 1. Find subscriber in Multipub                          │
│    a. Try email match                                   │
│    b. Try user_id match                                 │
│    c. Try name+org match (if single result only)        │
│                                                         │
│ 2. Check subscription scenarios:                        │
│    • _get_active_orders()                               │
│    • _get_recently_expired_orders()                     │
│    • _get_recent_single_issue_orders()                  │
│                                                         │
│ 3. If any relevant subscriptions found:                 │
│    • Set should_flag_for_review = True                  │
│    • Build review_reason summary                        │
│                                                         │
│ 4. Return MultipubValidationResult                      │
└─────────────────────────────────────────────────────────┘
    ↓
If flagged for review:
    → notify_multipub_subscriber_followup_batch()
    → Log detailed findings
    → Continue with inactive marking (with notification)
    ↓
Proceed with _handle_inactive()

SQL Queries#

Active Orders (Multi-issue + Site License):

sql

-- Direct orders
SELECT o.onum, o.subsnum, o.prdcde, p.prdtyp, p.gprdtyp,
       o.stat, o.odte, o.eiss, o.pamt, o.trm, ...
FROM pub.ord o
INNER JOIN pub.prd p ON p.prdcde = o.prdcde
WHERE o.subsnum = ?
  AND o.stat IN ('01','02','03','04','05','30')
  AND o.eiss > GETDATE()
  AND p.SM = 'M'

-- Site license orders (additional complexity)
-- Includes JOIN to pub.site, pub.SITEORD, pub.SITEUSER
-- Excludes IP-based site licenses

Expired Orders (Last 12 Months):

sql

SELECT o.onum, o.subsnum, ...
FROM pub.ord o
INNER JOIN pub.prd p ON p.prdcde = o.prdcde
INNER JOIN pub.sc sc ON sc.prdcde = o.prdcde AND sc.eiss = o.eiss
WHERE o.subsnum = ?
  AND o.stat NOT IN ('01','02','03','04','05','30')
  AND o.eiss BETWEEN DATEADD(YEAR,-1,GETDATE()) AND GETDATE()
  AND p.SM = 'M'

Single-Issue Orders (Last 12 Months):

sql

SELECT o.onum, o.subsnum, ...
FROM pub.ord o
INNER JOIN pub.prd p ON p.prdcde = o.prdcde
WHERE o.subsnum = ?
  AND o.odte > DATEADD(YEAR,-1,GETDATE())
  AND p.SM = 'S'

Notification Format#

When a contact is flagged for review, the notification includes:

Subject: [REVIEW REQUIRED] Multipub Validation Flag: John Doe

Contact: John Doe (john.doe@example.com)
Multipub Subscriber: 82025008
Match Method: email

=== SUBSCRIPTION FINDINGS ===

[ACTIVE SUBSCRIPTIONS] (2 found):
  - Order 12345: Product Name ABC (expires 2026-12-31)
  - Order 12346: Product Name XYZ (expires 2027-06-30) [Site License]

[RECENTLY EXPIRED] (1 found):
  - Order 12340: Product Name DEF (expired 2025-08-15)

[RECENT SINGLE-ISSUE PURCHASES] (3 found):
  - Order 12350: Single Issue Product (purchased 2025-10-01)
  ...

=== ACTION REQUIRED ===
Please review the above subscription details and verify that marking
this contact as inactive is appropriate.

Review Reason: 2 active subscription(s); 1 recently expired; 3 recent single-issue

Code Location: multipub.py → validate_for_inactive(), action_engine.py → _validate_multipub_before_inactive()

System Update Rules#

System	INACTIVE	REPLACEMENT	EMAIL_UPDATE / TITLE_UPDATE	ACTIVE
Cupola	✅ Mark inactive	✅ Mark original inactive + Add replacement	✅ Update email/title	✅ Mark active (or add new)
Hodor	✅ Mark inactive (Prospect Import row plus AutoRenewal SQL by email in live mode) — or email list only if `HODOR_AUTOMATIC_UPDATES=false`	✅ Mark original inactive (via _handle_inactive) — or email list	✅ Update email/title — or email list	✅ Mark active if inactive — or email list
Multipub	❌ SKIP (subscription data only)	❌ SKIP (via _handle_inactive)	❌ SKIP (marketing handles)	❌ SKIP (subscription data only)
Salesforce	⚠️ Only if NOT in Multipub	⚠️ Only if NOT in Multipub (via _handle_inactive)	⚠️ Only if NOT in Multipub	⚠️ Only if NOT in Multipub

Reasoning: Multipub is a subscription management system — the marketing team is responsible for updating contact information there. Salesforce Contacts linked to Multipub subscribers are managed through the Multipub→Salesforce sync, so direct updates would create conflicts.

Database Methods#

Cupola (IPv4 Team)

update_contact_status(unique_id, is_active)
update_contact_email(unique_id, new_email)
check_active_subscription(unique_id)
add_contact(email, name, title, organization=..., organization_id=...) — resolves target org, then same-org idempotent return, cross-org person_id reuse + new link_org_person, or new person+link; see docs/connections/cupola.html

Hodor (IPv4 Team)

update_contact_status(unique_id, is_active) — unused for INACTIVE determinations; INACTIVE flows instead combine Prospect Import rows with the AutoRenewal email-based update below.
update_contact_email(unique_id, new_email) — skipped when HODOR_AUTOMATIC_UPDATES=false (pending actions emailed to Venu via N01, NOTIFICATION_EMAIL_VENU)
update_contact_title(unique_id, new_title) — same
mark_inactive_by_email_for_autorenewal(email) — live-mode INACTIVE helper that executes IsNoLongerWithFirm = 1, LastUpdatedDate = getdate(), UpdatedReason = 'Updated based on AutoRenewal Program' for the sender email when HODOR_AUTOMATIC_UPDATES=true
check_active_subscription(unique_id)

Multipub (ACE Team)

update_contact_status(unique_id, is_active)
update_contact_email(unique_id, new_email)
check_active_subscription(unique_id) — Simple check
validate_for_inactive(...) — Full validation (Tarun's process)
lookup_by_email(email) — Primary lookup
lookup_by_user_id(user_id) — Primary lookup
lookup_by_name_and_org(first, last, org) — Secondary lookup (with caution)

Salesforce (ACE Team)

update_contact_status(unique_id, is_active)
update_contact_email(unique_id, new_email)
check_active_subscription(unique_id)

Run Outputs (Spreadsheets & Logs)#

Every run writes its artifacts to a fresh processing_reports/run_{timestamp}/ folder. The invariant is simple: every email gets a row on the Processing Report (master); uncertain rows also get a row on Human Review; automated writes produce an audit bundle for Venu.

Per-email ledgers (always written)#

File	Rows	Purpose
`processing_report_master.csv` / `.json`	Every email	The run's ledger of record. One row per email with its final determination, outcome, systems touched, and errors.
`processing_report_ip4.csv`	IP4-actionable categories only	Filtered copy for Sai's team (Out of Office, Retired, Deceased, Left Company, Changed Email). Uses the IP4 column subset and order (see `ReportGenerator._CSV_COLUMNS_IP4`); the master CSV remains the full ledger.
`processing_report.log`	Every email	Human-readable per-email log.
`category_summary_report.csv` / `.json`	Every email	Coarse category view (Undeliverable / Left Company / OOO / Changed Email / Other) for reporting.
`classifier_output/classifier_output.csv` / `.json`	Every classified email	Raw LLM + QA payloads.
`batch_report.html` / `batch_report.pptx`	Run summary	Dashboard + deck.

Outcome-specific spreadsheets (only when rows exist)#

File	Populated when	Consumer
`output_document_human_review.csv` / `.json`	UNKNOWN, no-match-actionable, bounce awaiting the Max + Vish rule, or replacement parse fallback	Sai action items + Venu audit metadata (`notify_sai_action_items` / `notify_venu_cupola_audit_files`)
`output_document_inactive_people.csv` / `.json`	INACTIVE actioned (including cross-verification rows where the person was already inactive)	Internal ops traceability; Sai action items when applicable. Marketing receives separate `_NoLongerThere_.csv` via N04 (not this file).
`{BusinessUnit}_NoLongerThere_{YYYY-MM-DD}.csv`	Same population as inactive people (deduped emails per business-unit label)	Marketing team (`notify_marketing_suppression`, N04) — SFMC suppression import
`output_document_alternate_contacts.csv` / `.json`	REPLACEMENT or ACTIVE-with-replacement	Sai Teja (`notify_sai_action_items` (catalog N05/N06) — `output_document_alternate_contacts.csv`)
`output_document_inactive_new_org.csv` / `.json`	EMAIL_UPDATE where the person also moved organisations	Sai Teja (`notify_sai_action_items` SECTION 3)
`output_document_inactive_no_cupola_match.csv` / `.json`	INACTIVE-style outcome with no CUPOLA match	Sai Teja (`notify_sai_action_items` SECTION 2)
`output_document_domain_matches.csv` / `.json`	Domain fallback found CUPOLA org context but multiple candidate people	Sai Teja / IP4 manual selection queue
`output_document_shared_email_reviews.csv` / `.json`	Exact email maps to multiple CUPOLA people (shared/designation mailbox)	Sai Teja / IP4 manual selection queue
`output_document_undeliverables.csv` / `.json`	Bounce / Undeliverable	Run directory; Client Services `notify_multipub_subscriber_followup_batch` (N02) when Multipub evidence applies — not emailed on N04
`output_document_multipub_audit.csv` / `.json`	Every INACTIVE row that hit the Multipub gate	Run directory only (not bulk-emailed); Tarun gets 1.2 / upload Yes-path
`output_document_email_update_requests.csv` / `.json`	Changed-Email rows	Run directory only (not emailed to marketing)

Venu audit bundle (always written; attached to Venu's end-of-run email)#

File	Purpose
`cupola_audit_log.csv` / `.json`	Every CUPOLA status change + contact addition recorded this run. Always written, even when empty, so Venu gets a receipt every run.
`cupola_audit_log_rollback_plan.csv`	SQL to revert each auto-applied CUPOLA status change. Only written when auto-apply actually ran.
`hodor_prospect_import.csv`	Written when `HODOR_AUTOMATIC_UPDATES=false`; emailed via `notify_hodor_prospect_import` (N01 To Venu).

Which outputs each outcome lands on#

Outcome	Master	IP4 copy	Human Review	Outcome-specific file(s)	Venu audit bundle
UNKNOWN	✅	⬜	✅	—	✅ (no writes happened)
No match (actionable, non-inactive)	✅	⬜	✅	—	✅
No match (inactive-style)	✅	✅	⬜	Inactive — no CUPOLA match	✅
ACTIVE	✅	✅	⬜	(long-leave → Client Services only)	✅
EMAIL_UPDATE / TITLE_UPDATE	✅	✅	⬜	Email Update Requests; Inactive at New Org (if moved)	✅
INACTIVE	✅	✅	⬜	Inactive People; Multipub Audit (when gated)	✅
REPLACEMENT	✅	✅	☑ parse fallback only	Inactive People + Alternate Contacts (± Multipub Audit)	✅
Bounce	✅	⬜	✅ until Max + Vish rule lands	Undeliverables	✅

Notifications#

Notification	Trigger	Recipient	Data / Attachments
`notify_multipub_subscriber_followup_batch()`	INACTIVE / left-org with determined sender + Multipub evidence (active / 12mo sub / 12mo purchase)	Angel + Yogesh (Matt Cc)	`multipub_subscriber_followup.csv`
`notify_tarun_undetermined_sender_review()`	Sender email could not be determined for Multipub routing	Tarun	`tarun_undetermined_senders.csv`
`notify_multipub_subscriber_followup_from_upload()`	Tarun upload API `outcome=yes`	Angel + Yogesh (Matt Cc; Tarun Cc)	Uploaded CSV
`notify_hodor_prospect_import()`	`HODOR_AUTOMATIC_UPDATES=false` and queued prospect-import rows	Hodor (Cc Sai)	`hodor_prospect_import.csv`
`notify_marketing_suppression()`	End of run when `_NoLongerThere_.csv` deliverable(s) exist	Marketing team (N04)	`{BusinessUnit}_NoLongerThere_{YYYY-MM-DD}.csv` (SFMC import columns)
`notify_sai_action_items()`	Replacements, no-Cupola-match, inactive-new-org, actionable HR rows	Sai	Section CSVs + `output_document_human_review_action_items.csv`
`notify_venu_cupola_audit_files()`	End of every run	Venu (Cc Sai, Max)	`cupola_audit_log.csv`, rollback plan, HR CSV when present
`notify_run_audit_for_ip4()`	End of every run	Sai	Processing log, master + IP4 CSV, category summary, batch HTML, impact.txt; backlog snapshot in body when available

Live email routing: see NOTIFICATIONS_CATALOG.html.

Batching: Most operational emails above are queued during processing and sent once per category at end of run via ActionEngine.flush_batch_notifications() (dry-run: logged only). Cupola audit attachments to Venu are separate. See NOTIFICATIONS_REFERENCE.html for recipient env vars and the full model.

Special Cases & Edge Cases#

Scenario	Current Behavior	Status
Changed Email + Alternate Contact	Maps to REPLACEMENT	✅ Implemented
Out of Office + Alternate Contact	Maps to OUT_OF_OFFICE; Human Review digest only, no system writes; alternates captured by `_collect_alternate_contacts` where applicable	✅ Implemented
Deceased + Replacement Contact	Maps to REPLACEMENT	✅ Implemented
Left Company + Replacement Contact	Maps to REPLACEMENT	✅ Implemented
Retired + Replacement Contact	Maps to REPLACEMENT	✅ Implemented
Replacement email matches sender	Clears replacement_info, changes to UNKNOWN	✅ Implemented
Alternate contact parsing fails	Falls back to INACTIVE	✅ Implemented

Case: Out of Office with Replacement Contact#

"I'm out of office, contact John at john@company.com"

Category: "Out of Office" — maps to OUT_OF_OFFICE, regardless of alternate_contact.
No system state changes. Row is added to the Human Review digest with reason HUMAN_REVIEW_REASON_OUT_OF_OFFICE so the dedicated OOO process can pick it up.
Any parsed alternate contact is still emitted via output_document_alternate_contacts.* for the IP4 reviewer.
Enhancement needed: Decide if temporary replacements should be added to systems

Case: Retired with Personal Email#

"I'm retiring, keep in touch at personal@gmail.com"

Category: "Retired" → INACTIVE
retired_personal_email stored in notes for reference
Personal email is not updated in any system

Case: Undeliverable Email#

Bounce message indicating email is invalid.

Category: "Undeliverable" → INACTIVE
Contact marked inactive, subscriptions checked
Enhancement needed: Add method to mark email as invalid in EmailRepository

Configuration#

Environment Variables#

bash

# Required for LLM classification
OPENAI_API_KEY=your_openai_api_key_here

# Optional (defaults shown)
CLASSIFIER_BACKEND=openai       # or "ollama"
OLLAMA_MODEL=mistral-nemo:12b
OLLAMA_URL=http://localhost:11434/api/chat
MAX_CONCURRENCY=10
MOCK_ALL_SERVICES_FOR_DEV=false
WRITE_OPERATIONS_ENABLED=true

LLM Configuration#

OpenAI Model: gpt-4o-mini (production)
Temperature: 0.0 (deterministic)
Fallback: Ollama support for local/development use
Backend: Configured via CLASSIFIER_BACKEND setting

Operating Modes#

The application supports three operating modes:

Dev mock-all (`MOCK_ALL_SERVICES_FOR_DEV=true`)#

Cupola, Multipub, and Salesforce use mock connectors; SFMC uses a mock client. HodorDatabase is the real SQL connector (graceful-unavailable if HODOR_* / HODOR_CONNECTION_STRING is not set — not a MockHodor class). The email repository is mock by default; set dry_run_use_real_email_repository=true to read the real inbox. Step 4c contact lookup is skipped (main.py uses a minimal stub Contact so downstream steps can run). LLM classification is real. Operations are logged to dry_run_output.log where applicable.

Simulation — no writes (`WRITE_OPERATIONS_ENABLED=false`)#

Live database reads; writes are intercepted via ReadOnlyDatabaseWrapper for Cupola, Multipub, Salesforce, and Hodor when each connector is live and available. SFMC uses a mock client. Salesforce attempts a real connection with mock fallback. Write operations are logged to dry_run_output.log.

Live Mode (default)#

Real database connections and writes (with mock fallbacks where the code substitutes mocks after failed connectivity checks).

Mode Comparison#

Component	Live Mode	Simulation (no writes)	Dev mock-all
Email Extraction (Step 1)	Real MSSQL repository	Real MSSQL repository	Mock by default; real if `dry_run_use_real_email_repository=true`
Cupola / Multipub / Salesforce	Real reads and writes	Real reads; writes intercepted (logged)	Mock connectors
Hodor	Real SQL when configured	Real reads; writes intercepted when wrapped	Real `HodorDatabase` when configured (not mocked); Step 4c contact lookup skipped — `main.py` uses a stub `Contact`
SFMC	Live or mock fallback	Mock client	Mock
LLM Classification	Real: OpenAI / Ollama	Real: OpenAI / Ollama	Real: OpenAI / Ollama
Notifications	Sent via SMTP	Logged only (no emails sent)	Logged only (no emails sent)
Action Logging	Standard logging	ActionLogger → `dry_run_output.log`	ActionLogger → `dry_run_output.log`
Stage Logging	StageLogger → `stage_execution.log`	StageLogger → `stage_execution.log`	StageLogger → `stage_execution.log`

Project Structure#

Core Application#

File	Description
`src/auto_responder/main.py`	Main entry point — orchestrates all steps
`src/auto_responder/config.py`	Application settings (Pydantic BaseSettings, loaded from `.env`)

Models#

File	Description
`src/auto_responder/models/email.py`	`AutoResponseEmail` — email data model
`src/auto_responder/models/contact.py`	`Contact`, `ContactRecord` — unified contact model
`src/auto_responder/models/determination.py`	`Determination` enum, `DeterminationResult`, `ReplacementInfo`
`src/auto_responder/models/output_document.py`	`InactivePersonRecord`, `AlternateContactRecord`, `InactiveNewOrgRecord`, `InactiveNoCupolaMatchRecord`, `UndeliverableRecord`, `MultipubAuditRecord`, `HumanReviewRecord` — output document models
`src/auto_responder/models/subscription.py`	`MultipubValidationResult`, `MultipubOrder`, `MultipubSubscriber`

Processors#

File	Description
`src/auto_responder/processors/action_engine.py`	Action execution engine — routes determinations to handlers
`src/auto_responder/processors/contact_lookup.py`	Unified contact lookup across all databases
`src/auto_responder/processors/email_processor.py`	Email deduplication and filtering

Services — Determination#

File	Description
`src/auto_responder/services/determination/autoresponsedetermination.py`	Main determination service (LLM pipeline orchestrator)
`src/auto_responder/services/determination/llm_classifier.py`	LLM classification and QA agents (OpenAI / Ollama)
`src/auto_responder/services/determination/category_mapper.py`	Category → Determination mapping, alternate_contact parsing
`src/auto_responder/services/determination/mock_classifier.py`	Mock LLM classifier for dry-run mode (keyword matching)

Services — Databases#

File	Description
`src/auto_responder/services/databases/base.py`	Base database class (retry logic, health checks, connection enhancement)
`src/auto_responder/services/databases/cupola.py`	Cupola SQL Server connector (IPv4 team)
`src/auto_responder/services/databases/hodor.py`	Hodor SQL Server connector (IPv4 team)
`src/auto_responder/services/databases/multipub.py`	Multipub SQL Server connector + subscription validation (ACE team)
`src/auto_responder/services/databases/salesforce.py`	Salesforce REST API connector (ACE team)
`src/auto_responder/services/databases/mock.py`	Mock database connectors for dry-run mode
`src/auto_responder/services/databases/readonly_wrapper.py`	Read-only wrapper — live reads, mocked writes (for read-only mode)

Services — Email & Notifications#

File	Description
`src/auto_responder/services/email_repository.py`	MSSQL email extraction from central repository
`src/auto_responder/services/mock_email_repository.py`	Mock email repository for dry-run mode
`src/auto_responder/services/email_service.py`	SMTP email sending service
`src/auto_responder/services/notifications/notifier.py`	Notification service (Client Services, Sai Teja, Venu, Multipub validation flags)
`src/auto_responder/services/sfmc/client.py`	SFMC REST client — suppression lists (live)
`src/auto_responder/services/sfmc/mock.py`	SFMC mock client (read-only mode, dry-run)

Services — Outlook (Deprecated)#

File	Description
`src/auto_responder/services/outlook/graph_client.py`	Microsoft Graph API client — DEPRECATED (replaced by central SQL server)

Utilities#

File	Description
`src/auto_responder/utils/action_logger.py`	Comprehensive operation logging for dry-run and read-only modes
`src/auto_responder/utils/stage_logger.py`	Stage-by-stage execution logging (always enabled)
`src/auto_responder/utils/logging_config.py`	Logging configuration and correlation context
`src/auto_responder/utils/output_document_collector.py`	Collects records for the three business deliverable lists
`src/auto_responder/utils/output_document_generator.py`	Generates CSV/JSON output documents from collected records
`src/auto_responder/utils/report_generator.py`	Per-email processing report generation (log + JSON + CSV)

Original Reference#

File	Description
`Auto_responder_email/unified_email_classifier.py`	Original LLM classification program (reference only, integrated into services)

Integration History#

The LLM-based email classification from the original Auto_responder_email/unified_email_classifier.py was fully integrated into the AutoResponderProcess. Below is a summary of the integration phases.

Phase 1: Core Integration ✅#

Component	File	Changes
Model Enhancements	`models/determination.py`	Added `sender_new_email`, `is_long_term_leave`, `retired_personal_email`, `confidence` fields
Configuration	`config.py`	Added OpenAI, Ollama, and concurrency settings
LLM Classifier	`services/determination/llm_classifier.py`	Created — extracted from original, pure service implementation
Category Mapper	`services/determination/category_mapper.py`	Created — maps LLM categories to Determination enum
Determination Service	`services/determination/autoresponsedetermination.py`	Replaced placeholder with full LLM integration

Phase 2: Action Enhancements ✅#

Component	File	Changes
Database Connectors	`services/databases/*.py`	Added `update_contact_email()` to all connectors
Action Engine	`processors/action_engine.py`	Enhanced all handlers; added Multipub validation integration
Subscription Models	`models/subscription.py`	Created — `MultipubValidationResult`, `MultipubOrder`, `MultipubSubscriber`
Multipub Connector	`services/databases/multipub.py`	Major enhancements for Tarun's validation queries
Notifier	`services/notifications/notifier.py`	Added Multipub validation notification
Dependencies	`requirements.txt`	Added `openai>=1.12.0`

Error Handling#

Database Update Failures#

Log each failure individually
Continue with other systems (graceful degradation)
Return False if all systems fail
Return True if at least one system succeeds

Notification Failures#

Log failure but don't fail the entire operation
Store failed notifications for manual review

LLM Classification Failures#

Return UNKNOWN determination with confidence 0.0
Log error with full email content
Skip processing rather than crashing

Contact Lookup Failures#

Graceful degradation — continues even if some database connectors fail to initialize
Unified contact returned with records from whichever systems responded

Common Issues & Solutions#

Issue	Solution
Replacement contact matches sender	Clear replacement_info, change to UNKNOWN
Invalid email format	Validate with regex before updating
Database connection timeout	Retry with exponential backoff
LLM returns malformed JSON	Return UNKNOWN, log for review
Contact not found in any system	Log warning, skip processing

Performance Considerations#

Batch Processing#

Current implementation processes emails sequentially
Determination service supports determine_batch() for future parallel use
Respect API rate limits for LLM calls

Database Connections#

Reuse database connections where possible
Connection pooling recommended
Handle connection timeouts gracefully

Async/Sync Bridge#

Uses asyncio.run() which may cause issues if called from an existing event loop
Consider making main.py async in the future for better performance

Caching#

Cache contact lookups to avoid repeated queries
Cache LLM classifications for duplicate emails
Consider caching determination results

Testing#

Test Scenarios#

Test Case	Input	Expected Determination	Expected Actions
Retired Person	Email with "retirement" keyword	INACTIVE	Mark inactive, check subscriptions, notify if needed
Out of Office	Email with "out of office" keyword	OUT_OF_OFFICE	Human Review digest only; no system writes
Changed Email	Email with "new email is X"	EMAIL_UPDATE	Update email/title in Cupola/Hodor/SF (skips per rules)
Replacement Contact	Email with "contact John at john@co.com"	REPLACEMENT	Mark inactive, add replacement, notify
Undeliverable	Bounce message	INACTIVE	Mark inactive, check subscriptions
Not Relevant	System notification	UNKNOWN	Skip processing

Testing Checklist#

[ ] Retired person → INACTIVE
[ ] Out of office → OUT_OF_OFFICE (Human Review digest only, no writes)
[ ] Changed email → EMAIL_UPDATE
[ ] Replacement contact → REPLACEMENT
[ ] Undeliverable → INACTIVE
[ ] Not relevant → UNKNOWN
[ ] Long-term leave flag → ACTIVE with logging
[ ] Active subscription → Notification sent
[ ] Multiple systems → All updated correctly
[ ] Database failure → Partial success handling
[ ] Multipub validation — active subscription → Flag notification
[ ] Multipub validation — recently expired → Flag notification
[ ] Multipub validation — single-issue purchase → Flag notification
[ ] Multipub validation — no subscriptions → Proceed normally

Implementation Status & Enhancements#

Core Functionality (Implemented)#

[x] UNKNOWN determination handling
[x] INACTIVE determination handling
[x] REPLACEMENT determination handling
[x] EMAIL_UPDATE / TITLE_UPDATE determination handling (shared _handle_title_update())
[x] ACTIVE determination handling
[x] Subscription checking
[x] Notification system
[x] Multipub validation (Tarun's process)
[x] Category mapper — alternate_contact for Deceased/Left Company/Retired
[x] Two-agent QA system (Classification + QA)
[x] Dry-run mode

Enhancements Needed#

High Priority

[ ] Optional: write-through updates to the central email repository (MSSQL inbox extract) when contact data changes — not implemented; operational updates target Cupola/Hodor/Salesforce per business rules
[ ] Add validation for new email addresses before updating
[x] Implement title update functionality in database connectors (update_contact_title on Cupola, Hodor, Multipub, Salesforce; action engine applies with Multipub / SF-if-Multipub skips)
[ ] Wire LLM (or rules) to populate DeterminationResult.updated_title / TITLE_UPDATE if title-only changes should be first-class

Medium Priority

[ ] Add category-specific logging (Retired vs Deceased vs Left Company)
[ ] Add metrics tracking per category
[ ] Add duplicate replacement prevention
[ ] Add retry logic for transient LLM API failures

Low Priority

[ ] Add phone number extraction from alternate_contact
[ ] Add notification for long-term leave
[ ] Add temporary replacement handling for Out of Office
[ ] Add email validation in email repository
[ ] Consider making main.py fully async
[ ] Add caching for repeated email classifications

Future Enhancements#

Machine Learning Model — Train custom model on historical classifications to reduce LLM dependency
Workflow Automation — Automatic approval for high-confidence; manual review queue for low-confidence
Advanced Contact Matching — Fuzzy matching, duplicate detection, contact relationship mapping
Reporting Dashboard — Real-time processing status, historical trends, error analysis

Known Limitations & Risks#

API Costs: LLM calls have per-email costs (OpenAI)
Latency: LLM calls add processing time per email
Rate Limits: Must respect OpenAI API rate limits
Malformed Responses: LLM may return unparseable JSON (falls back to UNKNOWN)
Async/Sync Bridge: asyncio.run() may conflict with existing event loops
Title Updates: update_contact_title() exists on connectors and _handle_title_update() applies it when DeterminationResult.updated_title is set; the LLM pipeline does not populate updated_title today, and TITLE_UPDATE is not returned by the category mapper — title-only automation needs future wiring
Multipub Linked Server: Queries use OPENQUERY to multipub_live linked server — must be properly configured
Error Recovery: Transient API failures fall back to UNKNOWN; no retry logic yet

Team Responsibilities#

System	Team	Contact
Cupola	IPv4	-
Hodor	IPv4	-
Multipub	ACE	Tarun (validation queries)
Salesforce	ACE	-
LLM Classification	-	Shubham
Action Engine	-	-
Notifications (Multipub flags)	-	Max / Client Services (recipient)
Notifications (Replacements)	-	Sai Teja (recipient)

Usage#

The integration is transparent — the main workflow remains the same:

python

from auto_responder.main import main

# Process 10 most recent emails (default)
main()

# Process a specific number of emails
main(max_results=25)

Operating Modes#

The application supports three modes controlled by environment variables:

Mode	Environment Variable	Description
Live	(default)	Real database connections and writes
Simulation (no writes)	`WRITE_OPERATIONS_ENABLED=false`	Live database reads; writes intercepted via `ReadOnlyDatabaseWrapper` (Cupola, Multipub, Salesforce, Hodor when configured) — operations logged
Dev mock-all	`MOCK_ALL_SERVICES_FOR_DEV=true`	Mock Cupola/Multipub/Salesforce/SFMC; real `HodorDatabase`; optional real email repository; real LLM; contact lookup skipped in `main.py`
Hodor manual	`HODOR_AUTOMATIC_UPDATES=false`	Hodor reads still run; Hodor writes are not executed — intended changes are emailed to Venu (`NOTIFICATION_EMAIL_VENU`, default Venu). Not the same as simulation (no-write) mode.

Note: Hodor uses a real SQL connector, not MockDatabase. Salesforce gracefully falls back to a mock if connection fails. See connections/README.html and connections/hodor.html.

Document Version: 3.6
Date: April 20, 2026
Status: Complete ✅
Last Updated: May 3, 2026 — Documented that processing_report_ip4.csv uses a fixed 23-column IP4 layout (ReportGenerator._CSV_COLUMNS_IP4) while the master CSV remains the full ledger; Client Services run-report email copy aligned. (Earlier April 20, 2026 update: Run Outputs section, Human Review, Venu audit bundle, notifications table.)

AutoResponder Determination System - Complete Reference#

Table of Contents#

Executive Summary#

Architecture Overview#

Original Email Classifier#

Current Application#

LLM Output Categories (9)#

Application Determination Types (6)#

Complete Process Flow#

PlantUML Diagram#

Processing Steps#

Data Flow Summary#

LLM Classification Pipeline#

Pipeline Steps#

Contact Information Fields#

Category-to-Determination Mapping#

Determination Types — Detailed Reference#

1. UNKNOWN#

2. INACTIVE#

3. REPLACEMENT#

4. EMAIL_UPDATE / TITLE_UPDATE#

5. ACTIVE#

6. OUT_OF_OFFICE#

Multipub Validation (Tarun's Process)#

Purpose#

Matching Criteria#

Subscription Scenarios Checked#

Validation Result Actions#

Validation Flow#

SQL Queries#

Notification Format#

System Update Rules#

Database Methods#

Run Outputs (Spreadsheets & Logs)#

Per-email ledgers (always written)#

Outcome-specific spreadsheets (only when rows exist)#

Venu audit bundle (always written; attached to Venu's end-of-run email)#

Which outputs each outcome lands on#

Notifications#

Special Cases & Edge Cases#

Case: Out of Office with Replacement Contact#

Case: Retired with Personal Email#

Case: Undeliverable Email#

Configuration#

Environment Variables#

LLM Configuration#

Operating Modes#

Dev mock-all (MOCK_ALL_SERVICES_FOR_DEV=true)#

Simulation — no writes (WRITE_OPERATIONS_ENABLED=false)#

Live Mode (default)#

Mode Comparison#

Project Structure#

Core Application#

Models#

Processors#

Services — Determination#

Services — Databases#

Services — Email & Notifications#

Services — Outlook (Deprecated)#

Utilities#

Original Reference#

Integration History#

Phase 1: Core Integration ✅#

Phase 2: Action Enhancements ✅#

Error Handling#

Database Update Failures#

Notification Failures#

LLM Classification Failures#

Contact Lookup Failures#

Common Issues & Solutions#

Performance Considerations#

Batch Processing#

Database Connections#

Async/Sync Bridge#

Caching#

Testing#

Test Scenarios#

Testing Checklist#

Implementation Status & Enhancements#

Core Functionality (Implemented)#

Dev mock-all (`MOCK_ALL_SERVICES_FOR_DEV=true`)#

Simulation — no writes (`WRITE_OPERATIONS_ENABLED=false`)#