Skip to main content

Indexer client

The indexerIndexer A node that reads blockchain data from a source chain, parses it into structured documents, and writes them to defraDB. Indexers are write-only: they push data out over P2P and reject all incoming replication. client is a standalone Go process that runs as a sidecar alongside an Ethereum validatorValidator An entity that participates in a chain's consensus. On Ethereum, a validator's withdrawal key signs an EIP-712 message on the outpost to authorize an operator key as its indexer. node. It connects to the validatorValidator An entity that participates in a chain's consensus. On Ethereum, a validator's withdrawal key signs an EIP-712 message on the outpost to authorize an operator key as its indexer.'s GethGeth Go-Ethereum, the Go implementation of an Ethereum node. Each indexer runs alongside a Geth node and connects over WebSocket (port 8546) and JSON-RPC (port 8545) to fetch block data. node, fetches every new block (with transactions, receipts, and logsLog A document type the indexer produces for event logs emitted during transaction execution. `topics` holds the indexed parameters and `data` holds the non-indexed ones, both as raw hex. ABI decoding happens later, in a lens.), structures them into DefraDBdefraDB A peer-to-peer, document-oriented database embedded in every indexer and host. It handles storage, content addressing, CRDT merging, query serving, and P2P replication via libp2p. documents, signs the batch, and publishes everything over P2P.

IndexersIndexer A node that reads blockchain data from a source chain, parses it into structured documents, and writes them to defraDB. Indexers are write-only: they push data out over P2P and reject all incoming replication. are write-only data producers. They push data out and reject all incoming replication.

Architecture

The indexerIndexer A node that reads blockchain data from a source chain, parses it into structured documents, and writes them to defraDB. Indexers are write-only: they push data out over P2P and reject all incoming replication. connects to GethGeth Go-Ethereum, the Go implementation of an Ethereum node. Each indexer runs alongside a Geth node and connects over WebSocket (port 8546) and JSON-RPC (port 8545) to fetch block data. over two channels:

  • WebSocket (port 8546): subscribes to new block headers for real-time feed.
  • HTTP JSON-RPC (port 8545): fetches full block details, fills gaps on restart, handles historical ranges. This is a backup connection.

Data processing pipeline

Each block goes through six stages:

  1. Receive: a new block header arrives over the WebSocket subscription.
  2. Fetch: the full block is pulled over JSON-RPC, including the header, transactions, receipts, logsLog A document type the indexer produces for event logs emitted during transaction execution. `topics` holds the indexed parameters and `data` holds the non-indexed ones, both as raw hex. ABI decoding happens later, in a lens., and access lists.
  3. Structure: raw data is transformed into the six document types.
  4. Sign: a Merkle root is computed over all document CIDs and signed with the indexerIndexer A node that reads blockchain data from a source chain, parses it into structured documents, and writes them to defraDB. Indexers are write-only: they push data out over P2P and reject all incoming replication.'s identity key.
  5. Store: documents are written to the local embedded DefraDBdefraDB A peer-to-peer, document-oriented database embedded in every indexer and host. It handles storage, content addressing, CRDT merging, query serving, and P2P replication via libp2p..
  6. Publish: DefraDBdefraDB A peer-to-peer, document-oriented database embedded in every indexer and host. It handles storage, content addressing, CRDT merging, query serving, and P2P replication via libp2p.'s P2P layer broadcasts to subscribed peers (hostsHost A Shinzo node that receives indexed data from indexers over P2P, verifies it, runs lens transforms to produce view documents, and serves those documents over GraphQL.).

Document types

The indexerIndexer A node that reads blockchain data from a source chain, parses it into structured documents, and writes them to defraDB. Indexers are write-only: they push data out over P2P and reject all incoming replication. produces six document types per block. The first four come directly from on-chain data. The last two are metadata that the indexerIndexer A node that reads blockchain data from a source chain, parses it into structured documents, and writes them to defraDB. Indexers are write-only: they push data out over P2P and reject all incoming replication. itself produces.

Collection names use a chain prefix: Ethereum__Mainnet__Block, Optimism__Mainnet__Block, etc. Schema definitions live in pkg/schema/schema_standard.graphql. There are two schema variants:

  • Standard: parallel transaction processing (default build).

Block

type Ethereum__Mainnet__Block {
hash: String
number: Int
timestamp: String
parentHash: String
difficulty: String
totalDifficulty: String
gasUsed: String
gasLimit: String
baseFeePerGas: String
nonce: String
miner: String
size: String
stateRoot: String
sha3Uncles: String
transactionsRoot: String
receiptsRoot: String
logsBloom: String
extraData: String
mixHash: String
uncles: [String]
transactions: [Ethereum__Mainnet__Transaction] @relation(name: "block_transactions")
}

Transaction

Merges fields from both the transaction object and its receipt. Receipt-specific fields: status, gasUsed, cumulativeGasUsed, effectiveGasPrice.

type Ethereum__Mainnet__Transaction {
hash: String
blockHash: String
blockNumber: Int
from: String
to: String
value: String
gas: String
gasPrice: String
gasUsed: String
maxFeePerGas: String
maxPriorityFeePerGas: String
input: String
nonce: String
transactionIndex: Int
type: String
chainId: String
v: String
r: String
s: String
status: Boolean
cumulativeGasUsed: String
effectiveGasPrice: String
block: Ethereum__Mainnet__Block @relation(name: "block_transactions")
logs: [Ethereum__Mainnet__Log] @relation(name: "transaction_logs")
accessList: [Ethereum__Mainnet__AccessListEntry] @relation(name: "transaction_accessList")
}

Log

Event logsLog A document type the indexer produces for event logs emitted during transaction execution. `topics` holds the indexed parameters and `data` holds the non-indexed ones, both as raw hex. ABI decoding happens later, in a lens. emitted during transaction execution. topics is an array of hex-encoded indexed parameters (topic[0] is the event signature hash). data is the hex-encoded non-indexed parameters. No ABIABI Describes how to encode and decode function calls, arguments, and event data for an EVM contract. Lenses such as `decode_log` take ABI JSON as input to read raw log data. decoding happens at this layer; everything is stored as raw hex.

AccessListEntry

EIP-2930EIP-2930 An Ethereum standard that adds optional access lists to transactions, pre-declaring which storage slots a transaction will read or write. The indexer stores these as `AccessListEntry` documents. access list entries. Most transactions do not have access lists, so this collection is typically sparse.

type Ethereum__Mainnet__AccessListEntry {
address: String
storageKeys: [String]
blockNumber: Int
transaction: Ethereum__Mainnet__Transaction @relation(name: "transaction_accessList")
}

BlockSignature

Created after all documents for a block are written. Contains a Merkle root computed over all document CIDs for that block, signed with the indexerIndexer A node that reads blockchain data from a source chain, parses it into structured documents, and writes them to defraDB. Indexers are write-only: they push data out over P2P and reject all incoming replication.'s identity key.

type Ethereum__Mainnet__BlockSignature {
blockNumber: Int
blockHash: String
merkleRoot: String
cidCount: Int
cids: [String]
signatureType: String
signatureIdentity: String
signatureValue: String
createdAt: String
}

SnapshotSignature

Seals a range of blocks into a single signed snapshot. The merkleRoot is computed over the per-block BlockSignature Merkle roots within the range, not over individual document CIDs.

type Ethereum__Mainnet__SnapshotSignature {
startBlock: Int
endBlock: Int
merkleRoot: String
blockCount: Int
signatureType: String
signatureIdentity: String
signatureValue: String
createdAt: String
snapshotFile: String
blockSigMerkleRoots: [String]
}

Document signing

Signing is opt-in and configured by the indexerIndexer A node that reads blockchain data from a source chain, parses it into structured documents, and writes them to defraDB. Indexers are write-only: they push data out over P2P and reject all incoming replication. operator. HostsHost A Shinzo node that receives indexed data from indexers over P2P, verifies it, runs lens transforms to produce view documents, and serves those documents over GraphQL. disable it with --no-signing.

The signing flow:

  1. At startup, the indexerIndexer A node that reads blockchain data from a source chain, parses it into structured documents, and writes them to defraDB. Indexers are write-only: they push data out over P2P and reject all incoming replication. loads a persistent identity from the DefraDBdefraDB A peer-to-peer, document-oriented database embedded in every indexer and host. It handles storage, content addressing, CRDT merging, query serving, and P2P replication via libp2p. keyring using GetIdentityContext().
  2. The identity is injected into the Go context via node.ContextWithBlockSigning(ctx, collector), which enables CID collection.
  3. As the block handler writes documents, DefraDBdefraDB A peer-to-peer, document-oriented database embedded in every indexer and host. It handles storage, content addressing, CRDT merging, query serving, and P2P replication via libp2p. collects the CID of each document written.
  4. After all documents for a block are written, the indexerIndexer A node that reads blockchain data from a source chain, parses it into structured documents, and writes them to defraDB. Indexers are write-only: they push data out over P2P and reject all incoming replication. calls node.SignBlock, which computes a Merkle root over the collected CIDs, signs it, and writes the BlockSignature document.
  5. Each individual document also gets a _version entry with identity and signature:
{
"_version": [
{
"cid": "bafyreig5...",
"height": 1,
"signature": {
"identity": "did:key:z6Mk...",
"value": "0x3045022100..."
}
}
]
}

Pruning

The pruner removes old data to keep storage bounded. It uses a queue-based system and persists its state to {storePath}/prune_queue.gob.

Two queue implementations exist:

  • IndexerQueue: tracks document IDs at creation time. When it prunes, it removes entire blocks at once (Block + all its Transactions, LogsLog A document type the indexer produces for event logs emitted during transaction execution. `topics` holds the indexed parameters and `data` holds the non-indexed ones, both as raw hex. ABI decoding happens later, in a lens., AccessListEntries, and BlockSignatureBlockSignature A document the indexer writes after it finishes processing a block. It contains a Merkle root over all document CIDs for that block, signed with the indexer's identity key. Hosts check it to verify the batch came from a known indexer.).
  • EventQueue: FIFO queue for P2P replication events. Drains documents as they arrive. Used by hostsHost A Shinzo node that receives indexed data from indexers over P2P, verifies it, runs lens transforms to produce view documents, and serves those documents over GraphQL., not indexersIndexer A node that reads blockchain data from a source chain, parses it into structured documents, and writes them to defraDB. Indexers are write-only: they push data out over P2P and reject all incoming replication..

If the queue is empty (first run or after restart with no persisted state), the pruner falls back to filter-based pruning, querying DefraDBdefraDB A peer-to-peer, document-oriented database embedded in every indexer and host. It handles storage, content addressing, CRDT merging, query serving, and P2P replication via libp2p. directly for old documents.

Note: BlockSignature is referenced as BatchSignature in parts of the app-sdk code. This is a naming inconsistency, not a separate type.

Snapshots

Snapshots bundle multiple blocks into a single signed file for faster initial sync.

ParameterDefaultNotes
BlocksPerFile1000Blocks per snapshot file. Some docs reference 100, but that value is the pagination size for querying blocks from DefraDBdefraDB A peer-to-peer, document-oriented database embedded in every indexer and host. It handles storage, content addressing, CRDT merging, query serving, and P2P replication via libp2p..
IntervalSeconds60How often the snapshot loop checks for new blocks

Environment variables: SNAPSHOT_ENABLED, SNAPSHOT_BLOCKS_PER_FILE, SNAPSHOT_INTERVAL_SECONDS.

P2P data distribution

The indexerIndexer A node that reads blockchain data from a source chain, parses it into structured documents, and writes them to defraDB. Indexers are write-only: they push data out over P2P and reject all incoming replication. does not manage P2P connections directly. DefraDBdefraDB A peer-to-peer, document-oriented database embedded in every indexer and host. It handles storage, content addressing, CRDT merging, query serving, and P2P replication via libp2p. handles all of that through libp2plibp2p A peer-to-peer networking library. defraDB uses it for peer discovery and document replication between indexers and hosts.:

  1. The indexerIndexer A node that reads blockchain data from a source chain, parses it into structured documents, and writes them to defraDB. Indexers are write-only: they push data out over P2P and reject all incoming replication. writes a document.
  2. DefraDBdefraDB A peer-to-peer, document-oriented database embedded in every indexer and host. It handles storage, content addressing, CRDT merging, query serving, and P2P replication via libp2p. computes a content digest.
  3. DefraDBdefraDB A peer-to-peer, document-oriented database embedded in every indexer and host. It handles storage, content addressing, CRDT merging, query serving, and P2P replication via libp2p. gossips the digest to connected peers.
  4. A peer that wants the document requests its full content.
  5. DefraDBdefraDB A peer-to-peer, document-oriented database embedded in every indexer and host. It handles storage, content addressing, CRDT merging, query serving, and P2P replication via libp2p. sends the full document.

This is unidirectional. The replication filter in pkg/indexer/replication_filter.go rejects all inbound documents. IndexersIndexer A node that reads blockchain data from a source chain, parses it into structured documents, and writes them to defraDB. Indexers are write-only: they push data out over P2P and reject all incoming replication. only push.

Bootstrap peers are configured in the DefraDBdefraDB A peer-to-peer, document-oriented database embedded in every indexer and host. It handles storage, content addressing, CRDT merging, query serving, and P2P replication via libp2p. config. Peers are also discovered through EntityRegistered events from ShinzoHubShinzoHub Shinzo's coordination chain: a Cosmos SDK chain (v0.53.4) with an integrated EVM, running CometBFT consensus. It holds the view, host, and indexer registries and the economic layer (staking, pricing, payments). It does not store or serve indexed blockchain data..

Resource requirements

ResourceMinimumRecommended
CPU2 cores4 cores
RAM4 GB8 GB
Storage50 GB (with pruning)100 GB (with pruning)
Network100 Mbps1 Gbps

Configuration

GETH_RPC_URL=https://json-rpc.example.com
GETH_WS_URL=ws://ws.example.com
GETH_API_KEY=your_api_key
INDEXER_START_HEIGHT=0
SNAPSHOT_ENABLED=false
SNAPSHOT_BLOCKS_PER_FILE=1000
SNAPSHOT_INTERVAL_SECONDS=60
LOG_LEVEL=error

INDEXER_START_HEIGHT is the block number to start indexingIndexing The process of parsing blockchain data and storing it as structured, schema-compliant documents in defraDB. from on first run with no existing data. Setting the value to 0 will start indexingIndexing The process of parsing blockchain data and storing it as structured, schema-compliant documents in defraDB. at the tip of any chain. For a specific block select please use that blocknumber.

Chain abstraction (in progress)

The codebase is being refactored from EVMEVM The runtime that executes Solidity smart contracts. ShinzoHub embeds an EVM module so its registries are accessible as precompiled contracts at fixed addresses, callable from any EVM tooling.-only to support multiple chains. The approach splits the current monolithic logic into a Chain interface with three parts:

  • Fetcher: retrieves raw block dataBlock Data Raw blockchain data: blocks, transactions, and logs. The indexer fetches this from source chains such as Ethereum and structures it into defraDB documents. from the chain-specific RPC.
  • Converter: transforms chain-specific data into Shinzo's canonical document types.
  • BlockHandler: writes documents to DefraDBdefraDB A peer-to-peer, document-oriented database embedded in every indexer and host. It handles storage, content addressing, CRDT merging, query serving, and P2P replication via libp2p. (chain-agnostic).

Key files

PathPurpose
cmd/block_poster/main.goEntry point
pkg/rpc/ethereum_client.goGethGeth Go-Ethereum, the Go implementation of an Ethereum node. Each indexer runs alongside a Geth node and connects over WebSocket (port 8546) and JSON-RPC (port 8545) to fetch block data. RPC client (WebSocket + HTTP)
pkg/defra/block_handler.goBlock processing and document creation
pkg/indexer/replication_filter.goRejects all incoming P2P replication
pkg/snapshot/snapshot.goSnapshot signature creation
pkg/schema/schema_standard.graphqlCollection schemas for the 6 doc types
pkg/constants/collections.goCollection name constants (chain-prefixed)