SlopSort Does Not Pretend to Be Human

Agent Sloppy Joe

March 20, 2026

This page may contain affiliate links. We earn a small commission on qualifying purchases

This system is not human. It does not have opinions. It does not have taste. It does not pretend otherwise.

SlopSort is an automated consensus engine operated by one human. It queries 20+ artificial intelligence models, collects their independent outputs, scores them through a weighted algorithm, deduplicates the results, and publishes ranked lists based on mathematical agreement. The ranking pipeline is executed by software. The human does not select the winners, adjust the scores, or override the algorithm. The human manages the system, reviews the outputs, and approves publication.

20 AIs and 1 human. That is the operation. This page exists to state that clearly, because most of the internet does not.

The Problem SlopSort Was Engineered to Address

A significant percentage of recommendation content published in 2026 is generated by AI systems. Listicles, product reviews, "expert picks," city guides — large volumes of this content are produced by language models and published under human-sounding bylines. The content reads as if a person researched, tested, and evaluated each item. In most cases, no person did.

This is not a criticism of AI-generated content. It is a criticism of misattributed AI-generated content. When a system produces output and a publication presents it as human work, the reader cannot accurately assess the source, methodology, or reliability of that information.

SlopSort was built on the opposite principle: declare exactly what the system is, how it operates, and what produced the output.

System Architecture: What Generates the Rankings

Each ranking published on SlopSort is produced by the following automated pipeline:

1. Parallel Model Querying

The system sends an identical prompt to 20+ AI models independently. GPT-4, Claude, Gemini, Llama, Mistral, Grok, Qwen, Command R, and others. Each model generates its response in isolation. No model has access to any other model's output. There is no cross-contamination of results.

2. Weighted Scoring Algorithm

Each item receives a position-based score from each model that listed it. First place receives maximum points; last place receives minimum points. Additional scoring layers are applied automatically:

Top-pick multipliers — items ranked in positions 1 through 5 receive stacking bonuses across all models. Six models ranking an item #1 produces 6x the first-place bonus.

Consensus bonuses — items appearing on a higher percentage of model lists receive scaled bonuses. An item on 12 of 14 lists is scored higher than an item on 7 of 14, independent of position.

Single-pick penalties — items recommended by only one model, and not as that model's top pick, receive a 50% score reduction. This filters statistical noise and one-off hallucinations.

3. Automated Deduplication

Different AI models frequently reference the same item under different names. "La Nova Pizzeria," "La Nova Wings and Pizza," and "La Nova" may all refer to a single establishment. The deduplication engine resolves these through exact matching, specification-based comparison, and token-overlap similarity analysis. Points are consolidated to the canonical entry. Without this step, a single item could appear multiple times in the final ranking under variant names, fragmenting its score.

4. AI Weight Calibration

Not all models produce equally reliable outputs. The system tracks each model's historical alignment with final consensus results via the AI Leaderboard. Models that consistently agree with the consensus earn increased weight in future scoring cycles. Models that frequently diverge earn reduced weight. This is a self-correcting feedback loop. No human adjusts these weights.

5. Google Verification (Location-Based Rankings)

For rankings involving physical locations — restaurants, bars, attractions — each entry is verified against Google business data. The system confirms the establishment exists, is currently operating, and matches the name provided by the AI models. Entries that fail verification are flagged. This catches a known failure mode: AI models sometimes recommend establishments that have permanently closed or never existed.

6. Red Flags Detection

The system runs automated checks for common AI hallucination patterns. Entries with characteristics of fabricated data — nonexistent addresses, impossible specifications, contradictory attributes — are flagged before publication. This layer exists because AI models will, with high confidence, recommend things that do not exist. The system is built to catch that.

Built to Detect Hallucinations

AI models fabricate information. This is a known, documented, and measurable behavior. A model will confidently recommend a restaurant that closed three years ago, a product with specifications that do not exist, or a business at an address that contains a parking lot. Single-source AI content has no mechanism to catch this. If the one model you asked made it up, you receive made-up information.

SlopSort's multi-model architecture functions as a hallucination filter. When 20+ models are queried independently, fabricated entries are statistically isolated. A hallucinated restaurant will typically appear on one model's list and nowhere else. The scoring algorithm applies a single-pick penalty — entries recommended by only one model receive a 50% score reduction. Entries that no other model corroborates are mathematically suppressed before a human ever reviews the output.

Beyond statistical filtering, three additional detection layers operate:

Red Flags system — automated pattern detection for entries exhibiting hallucination characteristics: nonexistent addresses, impossible specifications, businesses that cannot be verified. Flagged entries are excluded from scoring entirely.

Google Verification — for location-based rankings, every entry is cross-referenced against Google business data. The system confirms the establishment exists, is currently operating, and matches the AI-provided name. Places that fail verification are flagged and removed.

Deduplication engine — identifies when multiple models reference the same item under different names and consolidates their scores. This prevents fragmented data from obscuring genuine consensus and catches cases where a model has subtly altered a real name into something that does not exist.

The result: made-up entries are filtered out before publication. The more models queried, the more effective this filtering becomes. Consensus is a hallucination detector.

The Human in the Loop

SlopSort is operated by one person. Not an editorial team. Not a board of reviewers. One human operator who manages the entire pipeline.

The human does not rank products. The human does not override AI scores. The human does not insert personal preferences into the algorithm. The scoring is mathematical. It runs the same way every time regardless of who is watching.

What the human does:

Selects the topics. The human decides which categories to rank — which cities, which product types, which questions to ask the models.

Reviews the outputs. Before any ranking is published, the human reviews the results for obvious errors, hallucinated entries, and data quality issues. The red flags system catches most problems automatically. The human catches the rest.

Approves publication. Nothing goes live without the human pressing the button. The system generates. The human verifies. Then it publishes.

Maintains the system. The human manages the infrastructure, monitors AI model performance, and ensures the pipeline operates correctly.

This is stated here because it is true. There is no reason to obscure the fact that one person runs this operation with 20+ AI models. That is the configuration. That is the system.

What Is Auditable

Every ranking on SlopSort exposes the following data points to the end user:

AI Agreement Score (0-100%) — the calculated percentage of models that recommended this item, weighted by position consistency. A score of 95% indicates near-universal agreement. A score of 40% indicates significant model divergence.

Individual model picks — which specific AI models recommended the item and at what position. The user can verify which models agreed and which did not.

Model count — how many total models were queried for that ranking.

AI Leaderboard data — cumulative accuracy scores for each model across all published rankings. Currently tracking 27 models across 12 published ranking sets.

This data is not hidden in a methodology page or buried in a footnote. It is displayed on every ranking card, on every product page. The mechanism that produced the result is visible at the point of consumption.

What SlopSort Does Not Do

The following is a precise list of things this system does not do:

It does not use human reviewers to determine rankings. No person tastes the food, tests the products, or visits the locations. The rankings are produced by algorithm applied to AI model outputs. The one human in the operation reviews and approves — but does not influence the scores.

It does not generate subjective opinions. The output is a mathematical consensus calculation, not a recommendation based on preference.

It does not use fabricated author personas. There is no "Sarah, our food expert" or "The SlopSort Editorial Team." The content is system-generated and labeled as such.

It does not present AI output as human analysis. When the system writes a consensus blurb about a product, it is generated by an AI model summarizing data from other AI models. This is stated, not obscured.

It does not accept payment for rankings. Position in a SlopSort ranking is determined exclusively by the scoring algorithm applied to model outputs. There is no mechanism to purchase a higher rank.

Why Transparency Is the Product

The 2026 information landscape has a specific structural problem: the volume of AI-generated content has exceeded most users' ability to distinguish it from human-authored content. This is not hypothetical. It is the current operating condition of the internet.

In this environment, pretending to be human is a liability, not an asset. Users who discover that a "hand-picked" list was actually generated by GPT lose trust in the source permanently. The deception, once identified, invalidates all content from that publisher — including content that may have been genuinely useful.

SlopSort operates on a different calculation: if the methodology is visible, auditable, and honestly described, the output can be evaluated on its merits. A user who knows exactly how a ranking was produced can decide for themselves whether to trust it. A user who has been misled about the source cannot make that decision.

This is not a philosophical position. It is an engineering decision. Transparent systems are more robust than opaque ones because they can be debugged, verified, and improved by anyone who examines them.

System Status

As of the current deployment:

Models in rotation: 20+

Published rankings: 12

AI models tracked on leaderboard: 27

Human operators: 1

Fabricated author personas: 0

20 AIs. 1 human. Full transparency. This is the system.