Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.probalytics.io/llms.txt

Use this file to discover all available pages before exploring further.

Complete reference for all Probalytics database tables. All tables use the ReplacingMergeTree(indexed_at) engine, where indexed_at serves as the version column for deduplication.

markets

Primary table containing prediction market metadata from all supported platforms. Availability: All tiers (Starter tier: last 30 days only)

Schema

ColumnTypeDescription
idUUIDProbalytics unique market identifier
platformEnum('POLYMARKET', 'KALSHI', 'PREDICTIT', 'UNKNOWN')Source platform
platform_idStringPlatform’s native market ID
slugStringURL-friendly market identifier
urlStringDirect link to market on platform
titleStringMarket question/title
descriptionStringFull market description text
categoryLowCardinality(String)Market category (e.g., “Politics”, “Sports”)
tagsArray(LowCardinality(String))Associated tags
market_typeEnum('UNKNOWN', 'BINARY', 'MULTIPLE', 'SCALAR', 'PARLAY')Type of market
outcomesArray(Tuple(...))Possible outcomes (see Outcomes)
created_atDateTime64(3)When market was created
opened_atNullable(DateTime64(3))When trading opened
closes_atNullable(DateTime64(3))When trading closes
resolves_atNullable(DateTime64(3))Expected resolution time
end_dateNullable(DateTime64(3))Market end date
reset_atNullable(DateTime64(3))Last reset timestamp
statusEnum('PENDING', 'ACTIVE', 'PAUSED', 'CLOSED', 'RESOLVED')Current market state

Resolution Fields

ColumnTypeDescription
resolution_typeNullable(Enum('STANDARD', 'SPLIT', 'VOID'))How market was resolved
resolution_winning_outcome_idNullable(UUID)ID of winning outcome
resolution_outcome_payoutsArray(Tuple(outcome_id UUID, payout Decimal128(18)))Payout per outcome
resolution_resolved_byNullable(String)Resolver identifier
resolution_resolved_atNullable(DateTime64(3))Resolution timestamp
resolution_source_block_numberNullable(UInt64)Blockchain block number of resolution
resolution_source_tx_hashNullable(String)Transaction hash of resolution

Indexing Fields

ColumnTypeDescription
source_block_numberUInt64Blockchain block where market was indexed
source_tx_hashFixedString(66)Transaction hash of market creation
indexed_atDateTime64(3)When Probalytics indexed this record

Outcomes Structure

The outcomes column is an array of named tuples:
Array(Tuple(
    id UUID,            -- Probalytics outcome ID
    platform_id String, -- Platform's native outcome ID
    name String,        -- Outcome name (e.g., "Yes", "No", "Trump")
    index UInt8         -- Position in the outcomes list
))
Access fields by name:
SELECT
    id,
    title,
    outcome.id as outcome_id,
    outcome.name as outcome_name
FROM markets
ARRAY JOIN outcomes as outcome
WHERE platform = 'POLYMARKET'
LIMIT 5;

Market Types

TypeDescription
BINARYTwo outcomes: Yes/No
MULTIPLEMultiple mutually exclusive outcomes
SCALARNumeric range market
PARLAYCombined multi-leg market
UNKNOWNType not determined

Market Statuses

StatusDescription
PENDINGMarket created but not yet open for trading
ACTIVEOpen for trading
PAUSEDTrading temporarily suspended
CLOSEDTrading ended, awaiting resolution
RESOLVEDFinal outcome determined

fills

Individual trade fill records from all platforms. Each fill represents one side of a matched trade, with both taker and maker identified. Availability: All tiers (Starter tier: last 30 days only)

Schema

ColumnTypeDescription
idUUIDProbalytics unique fill identifier
market_idUUIDReference to markets.id
market_platform_idStringPlatform’s native market ID
platformEnum('POLYMARKET', 'KALSHI', 'PREDICTIT', 'UNKNOWN')Source platform
platform_idStringPlatform’s native fill/trade ID
outcomeTuple(id UUID, platform_id String, name String, index UInt8)The outcome traded
sizeDecimal128(18)Number of contracts traded
priceDecimal128(18)Price per contract in platform-native units
normalized_priceDecimal128(18)Price normalized to 0-1 range (probability)
taker_sideEnum('BUY', 'SELL')Whether the taker was buying or selling
taker_cash_flowDecimal128(18)Cash flow for the taker (negative = paid, positive = received)
maker_cash_flowDecimal128(18)Cash flow for the maker (negative = paid, positive = received)
taker_idNullable(String)Taker identifier (wallet address or platform user ID)
maker_idNullable(String)Maker identifier (wallet address or platform user ID)
feeDecimal128(18)Fee charged
timestampDateTime64(6)When the fill was executed (microsecond precision)

Indexing Fields

ColumnTypeDescription
source_block_numberNullable(UInt64)Blockchain block number (null for non-chain fills)
source_tx_hashNullable(String)Transaction hash (null for non-chain fills)
source_log_indexNullable(UInt32)Log index within the transaction
indexed_atDateTime64(3)When Probalytics indexed this record

Outcome Structure

The outcome column is a named tuple:
Tuple(
    id UUID,            -- Probalytics outcome ID
    platform_id String, -- Platform's native outcome ID
    name String,        -- Outcome name
    index UInt8         -- Position index
)
Access fields by name:
SELECT
    outcome.id as outcome_id,
    outcome.platform_id as platform_outcome_id,
    outcome.name as outcome_name,
    outcome.index as outcome_index,
    price,
    size
FROM fills
LIMIT 5;

Taker Side

SideDescription
BUYTaker bought contracts (betting outcome will happen)
SELLTaker sold contracts (betting outcome won’t happen)

orderbook_snapshots

Orderbook depth snapshots per market outcome, written when the book state changes. Each row captures the full bid/ask ladder at a point in time. Availability: Early Access only
Orderbook data is currently in Early Access. Data is available from November 20, 2025. High-quality, high-frequency snapshots begin from February 17, 2026. Earlier data may have lower capture frequency and gaps.

Schema

ColumnTypeDescription
market_idUUIDReference to markets.id
market_platform_idStringPlatform’s native market ID
platformEnum('POLYMARKET', 'KALSHI', 'PREDICTIT', 'UNKNOWN')Source platform
outcomeTuple(id UUID, platform_id String, name String, index UInt8)The outcome this snapshot is for
bidsArray(Tuple(price Decimal64(6), size Decimal64(6)))Bid levels sorted by price descending
asksArray(Tuple(price Decimal64(6), size Decimal64(6)))Ask levels sorted by price ascending
timestampDateTime64(3)When the snapshot was taken (millisecond precision)
indexed_atDateTime64(3)When Probalytics indexed this record

Bid/Ask Structure

Each level in the bids and asks arrays is a named tuple:
Tuple(
    price Decimal64(6), -- Price level
    size Decimal64(6)   -- Total size at this price
)
Access fields by name:
SELECT
    market_platform_id,
    outcome.name as outcome_name,
    bids[1].price as best_bid_price,
    bids[1].size as best_bid_size,
    asks[1].price as best_ask_price,
    asks[1].size as best_ask_size,
    timestamp
FROM orderbook_snapshots
WHERE market_id = 'your-market-uuid-here'
ORDER BY timestamp DESC
LIMIT 10;

Computed Fields

BBO, spread, and mid-price are not stored — compute them at query time:
  • best_bid = bids[1].price (first level in descending bids)
  • best_ask = asks[1].price (first level in ascending asks)
  • spread = asks[1].price - bids[1].price
  • mid_price = (bids[1].price + asks[1].price) / 2

Example: Bid-Ask Spread Over Time

SELECT
    toStartOfMinute(timestamp) as minute,
    outcome.name as outcome,
    avg(asks[1].price - bids[1].price) as avg_spread,
    avg(bids[1].size + asks[1].size) as avg_top_of_book_size
FROM orderbook_snapshots
WHERE market_id = 'your-market-uuid-here'
  AND timestamp >= now() - INTERVAL 1 HOUR
  AND length(bids) > 0
  AND length(asks) > 0
GROUP BY minute, outcome
ORDER BY minute ASC;

Table Relationships

FromToDescription
fills.market_idmarkets.idLinks fill to its market
fills.market_platform_idmarkets.platform_idPlatform-native market reference
fills.outcomemarkets.outcomes[]Traded outcome matches one in the market’s outcomes array
orderbook_snapshots.market_idmarkets.idLinks snapshot to its market
orderbook_snapshots.outcomemarkets.outcomes[]Snapshot outcome matches one in the market’s outcomes array
Join example:
SELECT
    m.title,
    f.price,
    f.size,
    f.outcome.name
FROM fills f
JOIN markets m ON f.market_id = m.id
WHERE m.platform = 'POLYMARKET'
LIMIT 10;

Ordering & Partitioning

Understanding table ordering helps write efficient queries.

markets

ORDER BY (platform, created_at, id, platform_id)
Queries filtering by platform and created_at are fastest.

fills

ORDER BY (platform, market_id, timestamp, id)
PARTITION BY toYYYYMM(timestamp)
  • Queries filtering by platform + market_id are fastest
  • Monthly partitions on timestamp allow efficient date range pruning

orderbook_snapshots

ORDER BY (market_id, outcome.id, timestamp)
PARTITION BY toYYYYMM(timestamp)
  • Queries filtering by market_id + outcome.id are fastest
  • Monthly partitions on timestamp allow efficient date range pruning

Data Types Reference

DateTime64(3) / DateTime64(6)

Millisecond (3) or microsecond (6) precision timestamps in UTC.
-- Filter by date range
WHERE timestamp >= '2024-01-01' AND timestamp < '2024-02-01'

-- Extract components
SELECT toYear(timestamp), toMonth(timestamp), toDayOfWeek(timestamp)

Decimal128(18) / Decimal64(6)

High-precision decimals for financial values. Decimal128(18) has 18 decimal places (used for fill prices/sizes). Decimal64(6) has 6 decimal places (used for orderbook levels).
-- Prices are typically 0-1 (probability as decimal)
WHERE price BETWEEN 0.1 AND 0.5

-- Safe arithmetic
SELECT sum(size), avg(price)

Enum Types

Enum values can be used as strings or integers:
-- String comparison
WHERE platform = 'POLYMARKET'

-- Also works (position-based)
WHERE platform = 1

Arrays

Use array functions for filtering and extraction:
-- Check if array contains value
WHERE has(tags, 'politics')

-- Expand array to rows using ARRAY JOIN
SELECT outcome.name
FROM markets
ARRAY JOIN outcomes as outcome

-- Array length
WHERE length(outcomes) > 2

-- Access by index (1-based)
SELECT bids[1].price as best_bid FROM orderbook_snapshots

Named Tuples

Access tuple fields directly by name:
SELECT
    outcome.id,
    outcome.platform_id,
    outcome.name,
    outcome.index
FROM fills