Skip to main content
Access complete prediction market data via daily bulk file exports in your preferred format.

Getting Started

Access Files

Log into app.probalytics.io and navigate to the Files section.

Select Data

Use the interface to choose:
  • Platform: Polymarket or Kalshi
  • Entity type: Markets or Trades
  • Frequency: Available exports depend on entity:
    • Markets: Daily, Weekly, Monthly
    • Trades: Daily, Weekly
  • File: Browse available exports in the file tree
Select download files
Once a file is selected, the Download button appears in the top right. Click to download.

File Formats

All files are automatically gzipped (.gz). Choose your format based on use case: CSV
  • Best for: Excel, data analysis, simple processing
  • Field headers included
  • Standard CSV formatting
Parquet
  • Best for: Analytics queries, data science, large datasets
  • Columnar format, highly compressed
  • Native support: Python (pandas, polars), R, Go, Java
  • Best performance for analytical queries
JSONL
  • Best for: Integration with APIs, JavaScript/Node.js
  • One JSON object per line
  • Standard JSON formatting

File Naming

Files follow this pattern:
{entity}_{platform}_{date_or_range}_{random_id}.{format}.gz
Examples:
trades_polymarket_2024-01-15_a7k9m2b1.csv.gz
markets_kalshi_2024-01-10_to_2024-01-15_x3l8n9q2.parquet.gz
markets_polymarket_2024-01_m7q1k3n5.jsonl.gz
Files are created on the following schedule:
  • Daily: 00:15 UTC (all entities)
  • Weekly: Mondays at 00:15 UTC (Markets and Trades)
  • Monthly: First day of month at 00:15 UTC (Markets only)

Parsing Examples

Choose examples based on your use case and language. All formats can be decompressed on-the-fly.

Python

Load and Explore with Pandas

Best for quick analysis and exploration.
import pandas as pd

df = pd.read_parquet('trades_polymarket_2024-01-15_a7k9m2b1.parquet.gz')

print(df.head())
print(df.dtypes)
print(df.describe())

Load and Explore with Polars

Faster for large files, better performance.
import polars as pl

df = pl.read_parquet('trades_polymarket_2024-01-15_a7k9m2b1.parquet.gz')

print(df.head())
print(df.schema)
print(df.describe())

Query and Filter with Polars

Efficient filtering with lazy evaluation.
import polars as pl

df = pl.read_parquet('trades_polymarket_2024-01-15_a7k9m2b1.parquet.gz')

# High-value trades
high_value = df.filter(pl.col('amount') > 1000)
print(high_value)

# Group by platform and sum volume
by_platform = df.groupby('platform').agg(pl.col('volume').sum())
print(by_platform)

CSV or JSONL - Standard Library

Lightweight parsing without dependencies.
import gzip
import json

with gzip.open('trades_polymarket_2024-01-15_a7k9m2b1.jsonl.gz', 'rt') as f:
    for line in f:
        trade = json.loads(line)
        if float(trade['price']) > 0.8:
            print(trade['market_id'], trade['price'])

JavaScript / Node.js

Load and Explore

import fs from 'fs';
import { createGunzip } from 'zlib';
import { parse } from 'csv-parse/sync';

const file = fs.readFileSync('trades_polymarket_2024-01-15_a7k9m2b1.csv.gz');
const decompressed = require('zlib').gunzipSync(file);
const data = parse(decompressed, { columns: true });

console.log(data.slice(0, 5));
console.log(`Total rows: ${data.length}`);

Stream JSONL

import fs from 'fs';
import { createGunzip } from 'zlib';
import { createInterface } from 'readline';

const rl = createInterface({
  input: fs.createReadStream('trades_polymarket_2024-01-15_a7k9m2b1.jsonl.gz')
    .pipe(createGunzip()),
  crlfDelay: Infinity
});

rl.on('line', (line) => {
  const trade = JSON.parse(line);
  if (parseFloat(trade.price) > 0.8) {
    console.log(trade.market_id, trade.price);
  }
});

Filter and Aggregate

import fs from 'fs';
import { createGunzip } from 'zlib';
import { parse } from 'csv-parse/sync';

const file = fs.readFileSync('trades_polymarket_2024-01-15_a7k9m2b1.csv.gz');
const decompressed = require('zlib').gunzipSync(file);
const data = parse(decompressed, { columns: true });

// Filter high-price trades
const highPrice = data.filter(row => parseFloat(row.price) > 0.8);

// Sum volume by platform
const byPlatform = data.reduce((acc, row) => {
  const platform = row.platform;
  acc[platform] = (acc[platform] || 0) + parseFloat(row.volume);
  return acc;
}, {});

console.log(byPlatform);

Common Workflows

Weekly Export Analysis

Download the weekly trade export for comprehensive weekly analysis:
import pandas as pd

# Weekly export (created every Monday)
trades = pd.read_parquet('trades_polymarket_2024-01-14_w2k7m1n3.parquet.gz')

print(f"Total trades: {len(trades)}")
print(trades.groupby('platform')['volume'].sum())

Monthly Market Snapshot

Get a complete monthly snapshot of markets:
import pandas as pd

# Monthly markets export (created on 1st of month)
markets = pd.read_parquet('markets_kalshi_2024-01_m7q1k3n5.parquet.gz')

print(f"Total markets: {len(markets)}")
print(markets.groupby('category')['status'].value_counts())

Stream JSONL Records

Process JSONL records line-by-line:
import gzip
import json

with gzip.open('trades_polymarket_2024-01-15_a7k9m2b1.jsonl.gz', 'rt') as f:
    for line in f:
        trade = json.loads(line)
        # Process each trade immediately
        if float(trade['price']) > 0.8:
            print(f"High price trade: {trade['market_id']} at {trade['price']}")

Data Schema

Files contain the same data as REST API responses. Refer to the REST API Reference for:
  • Complete field definitions
  • Data types and formats
  • Example values