What is an Epoch?

Epochs in Bittensor Network

Epochs in the context of the Bittensor network are used to manage the update and synchronization process of the network's state, including the weights of the neurons (network participants). Each epoch represents a time period during which the network participants train their models, exchange information, and update their local states. At the end of an epoch, the updated weights are submitted to the network, and a new epoch begins.

In Bittensor, the concept of epochs serves several purposes:

1. Synchronization

Epochs help synchronize the state of the network. By having a fixed time period during which participants can train their models and exchange information, the network ensures that all nodes have a chance to update their local states before the next epoch begins.

2. Weight updates

At the end of each epoch, the updated weights are submitted to the network, which helps maintain the global state and improve the overall performance of the network.

3. Incentivization

Bittensor uses a staking mechanism to incentivize network participants. By participating in the network and submitting their weights at the end of each epoch, participants can earn rewards in the form of tokens.

4. Decentralization

By dividing the network's operation into epochs, Bittensor ensures that no single participant has control over the entire network. This promotes decentralization and helps maintain the network's security and stability.

In summary, epochs in the Bittensor network are used to manage the update process, synchronize the network's state, incentivize participants, and promote decentralization.

Function: run_epoch

The run_epoch function executes a single validator epoch. It applies batches until the epoch length is exhausted. Occasionally, the validator nucleus is reset to ensure it doesn't converge too far. At the end of the epoch, weights are set on the chain and optionally logged to wandb.

Code Explanation

  1. Get parameters for the epoch depending on the selected network (either 'nakamoto' or 'finney'). Parameters include batch_size, sequence_length, prune_len, logits_divergence, min_allowed_weights, max_weight_limit, scaling_law_power, and synergy_scaling_law_power.

  2. Update the dataset size based on the calculated batch_size, sequence_length, and validation_len.

  3. Run the epoch: a. Initialize epoch-related variables like epoch_steps, epoch_responsive_uids, epoch_queried_uids, and epoch_start_time. b. Log the start of the epoch using wandb and prometheus. c. Run a loop until either the block count or the time limit is reached. i. Log the current state of the epoch. ii. Perform a forward pass through the network and calculate the loss and endpoint scores. iii. Perform a backward pass if the loss has a gradient. iv. Update neuron stats, responsive_uids, and queried_uids. v. Update the state of the epoch, including the global_step and current_block. vi. Perform optimization steps. vii. Log the state of the epoch using console messages, console tables, prometheus, and wandb. viii.Reset the metagraph.

  4. Calculate neuron weights at the end of the epoch.

  5. Set weights on the chain using the calculated weights.

  6. Log the end of the epoch using console messages, console tables, prometheus, and wandb.

  7. Increment the epoch counter.

Authors Notes

If you think this is wrong, this is an LLMs interpretation and explanation, not mine. I did proof-read and edit everything, but I'm still learning about this network, same as everybody else - this does however, all make sense from my perspective of what I know. That is to say: AI knows more about itself than I know of it.