Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.probalytics.io/llms.txt

Use this file to discover all available pages before exploring further.

Access complete prediction market data via daily bulk file exports.

Getting Started

Access Files

Log into app.probalytics.io and navigate to the Files section.

Select Data

Use the interface to choose:
  • Platform: Polymarket or Kalshi
  • Entity type: Markets or Fills
  • Frequency:
    • Markets: Monthly
    • Fills: Weekly
  • File: Browse available exports in the file tree
Select download files
Once a file is selected, the Download button appears in the top right. Click to download.

File Format

All files are exported as Parquet (.parquet.gz), automatically gzipped for efficient transfer.
  • Columnar format, highly compressed
  • Native support: Python (pandas, polars), R, Go, Java
  • Best performance for analytical queries

File Naming

Files follow this pattern:
{entity}_{platform}_{date_or_range}_{random_id}.parquet.gz
Examples:
fills_polymarket_2024-01-15_a7k9m2b1.parquet.gz
markets_kalshi_2024-01-10_to_2024-01-15_x3l8n9q2.parquet.gz
markets_polymarket_2024-01_m7q1k3n5.parquet.gz
Files are created on the following schedule:
  • Weekly: Mondays at 02:30 UTC (Fills)
  • Monthly: First day of month at 03:00 UTC (Markets)

Parsing Examples

Python

Load and Explore with Pandas

Best for quick analysis and exploration.
import pandas as pd

df = pd.read_parquet('fills_polymarket_2024-01-15_a7k9m2b1.parquet.gz')

print(df.head())
print(df.dtypes)
print(df.describe())

Load and Explore with Polars

Faster for large files, better performance.
import polars as pl

df = pl.read_parquet('fills_polymarket_2024-01-15_a7k9m2b1.parquet.gz')

print(df.head())
print(df.schema)
print(df.describe())

Query and Filter with Polars

Efficient filtering with lazy evaluation.
import polars as pl

df = pl.read_parquet('fills_polymarket_2024-01-15_a7k9m2b1.parquet.gz')

# High-value fills
high_value = df.filter(pl.col('size') > 1000)
print(high_value)

# Group by platform and sum size
by_platform = df.groupby('platform').agg(pl.col('size').sum())
print(by_platform)

JavaScript / Node.js

import { readParquet } from 'parquet-wasm';
import fs from 'fs';
import { gunzipSync } from 'zlib';

const compressed = fs.readFileSync('fills_polymarket_2024-01-15_a7k9m2b1.parquet.gz');
const buffer = gunzipSync(compressed);
const table = readParquet(buffer);

console.log(table.schema);
console.log(`Total rows: ${table.numRows}`);

Common Workflows

Weekly Export Analysis

Download the weekly fills export for comprehensive weekly analysis:
import pandas as pd

# Weekly export (created every Monday)
fills = pd.read_parquet('fills_polymarket_2024-01-14_w2k7m1n3.parquet.gz')

print(f"Total fills: {len(fills)}")
print(fills.groupby('platform')['size'].sum())

Monthly Market Snapshot

Get a complete monthly snapshot of markets:
import pandas as pd

# Monthly markets export (created on 1st of month)
markets = pd.read_parquet('markets_kalshi_2024-01_m7q1k3n5.parquet.gz')

print(f"Total markets: {len(markets)}")
print(markets.groupby('category')['status'].value_counts())

Data Schema

Files contain the same data as REST API responses. See the REST API section in the sidebar for complete field definitions and data types.