DRAFT: Shamir Secret Sharing for Mnemonic Phrases

Authors: Aljosha Judmayer, Philipp Schindler
Status: DRAFT
Version: 0.1
Last revision: 2021-05-07
Initial release: -

Abstract

This document describes a secret sharing scheme with a particular focus on the use case of sharing random mnemonic phrases. For example, this scheme may be used within the context of hierarchical deterministic (HD) cryptocurrency wallets. With this focus in mind, this specification is designed to be fully compatible with existing encodings of mnemonic phrases, in particular BIP39. It can be used to split a given (already existing) BIP39 mnemonic phrase of 12, 15, 18, 21, or 24 words, into up to n=255n = 255 shares, such that any configurable subset of shares of size tnt \leq n can be used to recover the original mnemonic phrase from the shares. However, this specification is not limited to the use of BIP39 for encoding and decoding of mnemonic phrases, but rather specifies the secret sharing on the level of byte sequences. Alternative encodings such as, e.g., bytewords, may be used instead of BIP39. The design is intentionally kept as simple as possible, and only adds minor modifications, in particular an integrated checksum mechanism, to the original secret sharing approach described by Shamir (1979).

Status of This Memo

This document is not an official standard; it is published without any warranty for informational purposes and aims to: consolidate community best practices, foster public review as well as contribution and motivate the creation of official standards based on the herein provided material.

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://github.com/de-centralized-systems/sssmp.

This work is licensed under the Creative Commons Attribution 4.0 International License.

CC BY 4.0

Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 (RFC2119, RFC8174) when, and only when, they appear in all capitals, as shown here.

Introduction

The demand of deriving and managing an increasing number of secret keys has risen in recent years. This is mostly due to the widened adoption of cryptocurrencies and their increase in valuation. These developments have fostered the creation of several different approaches to derive, encode and manage cryptographic keying material, within the respective community.

This specification attempts to consolidate community best practices and well known and established cryptographic techniques, in relation to the secret sharing of cryptographic keying material. Therefore, core requirements and design goals are formulated and separated in into different layers and processing steps of managing and deriving cryptographic keying material. This structure should provide the necessary bigger picture and define the context in which this specification has to be interpreted. The aim of this document is to specify a minimalistic and interoperable method for the secret sharing of cryptographic keying material intended for manual/human processing, e.g., as used in private offline/paper backups of mnemonic seed phrases.

Scope, Goals and Non-goals

To clarify the scope of this document, the relation to other processing steps in deriving and managing cryptographic keying material is outlined in the following.

Goals

Non-Goals

Background

This specification is based on Shamir Secret Sharing (SSS) and adds a checksum mechanism in the construction step. The added checksum does not affect the reconstruction of classical SSS if it is not checked. The specified scheme is intended to share high min-entropy cryptographic keying material, i.e., a secret ss, among a set of nn participants, such that at least tt are required to reconstruct the original secret ss. A detailed comparison between related implementations of secret sharing schemes based on SSS and utilized in this context of cryptocurrencies (like BIP39, BC-SSKR, Shamir39, etc.) is out of scope of this document.

Notation and Symbols

The following table provides an overview of the used notation in the remainder of this document.

Symbol Description
nn total number of shares to be created (1n255)\left(1 \leq n \leq 255\right)
tt minimum number of shares required for the recovery process (1tn)\left(1 \leq t \leq n\right)
bb number of bytes of a given secret or share
ss the secret value to be shared, represented by a sequence of bb bytes
ii an index variable used for the secret shares; 1in1 \leq i \leq n or 1it1 \leq i \leq t depending on the context
jj an index variable used for the coefficient of the secret sharing polynomials; 0j<t0 \leq j < t
kk an index variable used to denote the kthk^\textsf{th} byte in the secret, a share or the kthk^\textsf{th} secret sharing polynomial; 0k<b0 \leq k < b
sis_i a share of the secret ss; consists a share index xix_i (an integer between 11 and 255255) and share value yiy_i (a sequence of bb bytes)
cjc_j the jthj^{\textsf{th}} coefficients of the secret sharing polynomials as a sequence of bb bytes; the byte cj[k]c_j[k] represents the jthj^{\textsf{th}} coefficient of the secret sharing polynomial fk()f_k(\cdot)
fk()f_k(\cdot) the secret sharing polynomial used to share the kthk^{\textsf{th}} byte of the secret ss
αβ\alpha \mid\mid \beta concatenation of two sequences of bytes α\alpha and β\beta
α[k]\alpha[k] the kthk^\textsf{th} byte of the sequence of bytes α\alpha, 0-based indices used
α[:k]\alpha[:k] the first kk bytes of the sequence of bytes α\alpha

Sharing

In the following, we specify the process of creating 1n2551 \leq n \leq 255 shares (s1,s2,...,sn)(s_1, s_2, ..., s_n) for a given secret ss (a sequence of b16b \geq 16 bytes) such that any subset of at least 1tn1 \leq t \leq n of those shares can be used to recover the shared value.

Input:

Output:

Note that typical values for nn and tt are in the range of 2t2552 \leq t \leq 255 . If t=1t = 1, the number of shares required for the recovery is one. In this case, each share created is just the value being shared. If t=nt = n, all created shares are required for the recovery process. If, for example, n=5n = 5 and t=3t = 3, then 55 shares are created in total. and each combination of at least 33 shares can be used to recover the shared secret.

The process of secret sharing is executed on a byte-per-byte basis, i.e., each of the bb bytes of the secret ss to be shared is processed individually. This allows for a simple and efficient implementation using the finite field GF(28)\text{GF}(2^8) with 256 elements. In this field addition and subtraction of two field elements (two bytes) are defined using the bitwise xor operation. Multiplication of two field elements (again two bytes) is defined using the AES reducing polynomial x8+x4+x3+x+1x^8 + x^4 + x^3 + x + 1. For additional background information regarding finite field arithmetic we refer the reader to the Wikipedia page on Finite Field Arithmetic. The used secret sharing polynomials f0(),f1(),,fb1()f_0(\cdot), f_1(\cdot), \dots, f_{b-1}(\cdot) of degree t1t - 1 are evaluated over GF(28)\text{GF}(2^8):

fk(x)=c0[k]+c1[k]x+c2[k]x2++ct1[k]xt1,0k<b. f_k(x) = c_0[k] + c_1[k] \cdot x + c_2[k] \cdot x^2 + \dots + c_{t - 1}[k] \cdot x^{t - 1}, \quad 0 \leq k < b.

Note that here the values of c0,c1,,ct1c_0, c_1, \dots, c_{t - 1} are used to represent lists of coefficients, whereas the kthk^\textsf{th} element of each of the lists of coefficients is used to define the secret sharing polynomial for sharing the kthk^\textsf{th} byte of the secret ss.

Each list of coefficients cj0j<tc_j \mid 0 \leq j < t is defined as follows:

cj={sif j=0random(b)if 0<j<t1coefficients_with_checksum(s,random(b8))otherwise (j=t1t>1) c_j = \begin{cases} s &\text{if } j = 0 \\ \textsf{random}(b) &\text{if } 0 < j < t - 1 \\ \textsf{coefficients\_with\_checksum}(s, \textsf{random}(b - 8)) &\text{otherwise }(j =t -1 \land t > 1) \end{cases}

Hereby, the function random(b)\textsf{random}(b) creates a list of bb random bytes. The function MUST use a cryptographically secure source of randomness. If t>1t > 1, the last coefficient ct1c_{t - 1} is derived from the secret ss and a sequence rr of b8b-8 random bytes. This is done as defined by the coefficients_with_checksum(s,r)\textsf{coefficients\_with\_checksum}(s, r) function for the purpose of including a checksum as the last 88 bytes of ct1c_{t - 1}.

coefficients_with_checksum(s,r)=r  HMACSHA256(key=s, msg="secret sharing checksum"r)[:8]\textsf{coefficients\_with\_checksum}(s, r) = \\ r\ \mid\mid \ \text{HMAC}_{\text{SHA256}}(\text{key}=s,\ \text{msg}=\texttt{"secret sharing checksum"} \mid\mid r)[:8]

Here, [:8][:8] is used to denote the first (leftmost) 88 bytes of the given HMAC. The use of rr as argument to the HMAC message, ensures that all bits of the coefficients ct1c_{t - 1} are (pseudo-) random. Therefore, even if a the same secret ss is shared multiple times, the last coefficient is different with high probability. Given this definition of coefficients (and thus of the secret sharing polynomials), the nn shares s1,s2,...,sns_1, s_2, ..., s_n are computed as follows:

si=(i,f0(i)f1(i)fb1(i))1in s_i = \left( i, f_0(i) \mid\mid f_1(i) \mid\mid \dots \mid\mid f_{b-1}(i) \right) \quad 1 \leq i \leq n

Note that a share is a tuple of two values and MUST include the share index ii as well as the share value, obtained by evaluating the secret sharing polynomials and concatenating the results.

The only difference to Shamir's original secret sharing is the definition of coefficient ct1c_{t - 1}. In Shamir's original description, all coefficients cj0<j<tc_j \mid 0 < j < t are selected at random, whereas in our approach all but the last 88 bytes of the list of coefficients ct1c_{t - 1} are selected at random, whereas the last 88 bytes are pseudo-randomly derived from the secret ss and all but the last 88 bytes of ct1c_{t - 1} using an HMAC.

Recovery

This section describes how to reconstruct a secret from a set of tt given shares, computed with the method specified in Section Sharing. Implementations MUST be able to successfully recover the shared secret ss, if all given shares are valid and a sufficient number of shares is provided.

Implementation MAY provide additional functionality to aid with the recovery process in case invalid shares are provided. In this case implementations MUST provide a suitable warning message to the user to indicate the issue(s) with the provided shares. Advanced recovery functionality MUST explicitly be invoked by the user.

Input:

Output:

Implementations MUST reject inputs prior to attempting the recovery process, and display a corresponding error message in the following cases:

If the inputs are successfully validated according to these plausibility checks, implementations attempt to recover the shared secret ss from the given set of shares {s1,s2,,st}\{s_1, s_2, \dots, s_t\}. For this purpose the underlying secret sharing polynomials fk()f_k(\cdot) are evaluated using the Lagrange interpolation formula:

fk(x)=i=1tyi[k]i(x),0k<b f_k(x) = \sum_{i = 1}^{t} y_i[k] \cdot \ell_i(x), \quad 0 \leq k < b

i(x)=j=1ijtxxjxixj \ell_i(x) = \prod_{\substack{j = 1 \\ i \not = j}}^{t} \frac{x - x_j}{x_i - x_j}

To recover the shared secret ss we apply the above formula for x=0x = 0:

s[k]=c0[k]=fk(0)=i=1tyi[k]j=1ijtxjxjxi,0k<b. s[k] = c_0[k] = f_k(0) = \sum_{i = 1}^{t} y_i[k] \prod_{\substack{j = 1 \\ i \not = j}}^{t} \frac{x_j}{x_j - x_i}, \quad 0 \leq k < b.

Neglecting the notational differences covering the byte-per-byte execution of the process, this approach is essentially equal to Shamir's original description.

Checksum Verification

For the case where t=1t = 1, each generated share is equal to the shared secret itself. In this case there is no integrated checksum and the checksum verification steps are skipped. The value ss is directly returned as result of the recovery process.

Otherwise, after recovery, but before returning the recovered secret ss to the user, the integrated checksum MUST be verified. In case the verification fails, the user MUST be informed about the invalid checksum, and the recovered secret ss MUST NOT be displayed without such a warning.

The last list of coefficients ct1c_{t - 1} used to define the secret sharing polynomials were generated to include a (randomized) checksum using the HMAC mechanism introduced in Section Sharing. In order to verify this checksum, first the value of the coefficients ct1c_{t - 1} are computed from the set of tt shares {(x1,y1),(x2,y2),,(xt,yt)}\{(x_1, y_1), (x_2, y_2), \dots, (x_t, y_t)\} supplied for the recovery process. This, e.g., MAY be accomplished using the schemata of divided differences illustrated in the following. The process of recovering the values of the coefficients ct1c_{t - 1} using the schemata of divided differences, as the secret sharing process itself, is performed on a byte-per-byte basis for each byte in the list of coefficients ct1c_{t - 1} using kk as index variable:

ct1[k]=[x0,x1,,xt1]fk 0k<b. c_{t - 1}[k] = [x_0, x_1, \dots, x_{t - 1}] f_k\ \quad 0 \leq k < b.

Here, [x0,x1,,xt1]fk[x_0, x_1, \dots, x_{t - 1}] f_k denotes the dividing difference of the secret sharing polynomial fk()f_k(\cdot) and is recursively defined as given below:

[xi]fk=yi[k] [x_i] f_k = y_i[k]

[xi,,xj]fk=[xi+1,,xj]fk[xi,,xj1]fkxj[k]xi[k]. [x_i, \dots, x_j] f_k = \frac{[x_{i+1}, \dots, x_j] f_k - [x_i, \dots, x_{j-1}] f_k}{ x_j[k] - x_i[k]}.

Note that, as with the secret sharing process itself, all computations are performed in GF(28)\text{GF}(2^8). Subtraction in GF(28)\text{GF}(2^8) is the same as addition and implemented using a binary xor operation.

With the value of ct1c_{t - 1} recovered using the above process, and the values of s=c0s = c_0 obtained during the recovery process, the integrated checksum is verified by first getting the value rr as all but the last 88 bytes of ct1c_{t - 1} and then recomputing and comparing the value of ct1c_{t - 1} using the coefficients_with_checksum()\textsf{coefficients\_with\_checksum}(\cdot) function:

r=ct1[:b8]ct1=?coefficients_with_checksum(s,r). r = c_{t - 1}[: b-8] \\ c_{t - 1} \overset{?}{=} \textsf{coefficients\_with\_checksum}(s, r).

The checksum is valid if, and only if, the above equality holds. If the checksum is found valid, the value ss is returned as result of the recovery process. Otherwise the user MUST be informed of the failed recovery attempt. In this case, implementations MAY choose to return the recovery secret ss anyway, but they MUST NOT do so without provided the user with a clear indication of the failed checksum verification.

Design and Security Considerations

Metadata

Besides a set of at least tt valid shares with their associated indices, no additional data is needed to reconstruct the previously shared secret. Although, for practical purposes it is RECOMMENDED that implementations also generate a unique ID which represents the individual secret sharing session.

This secret sharing session identifier SHOULD be written down together with the mnemonic encoding of the share as well as its index. This ensures that shares of multiple different sessions do not get mixed and thus confused with each other.

Note that, if shares of multiple different sessions with large nn values become mixed, this might lead to a situation where it is no longer computationally feasible to perform a brute-force search to find the valid combinations that allow for the reconstruction of the original secrets.

Minimal secret size and intended use case

The minimal input size of the secret to be shared with the secret sharing mechanism specified in this document is limited to 128128 bits (1616 bytes). This is equivalent to the lower bound for cryptographic secrets considered by a range of other designs in this field. Any implementation of this specification MUST NOT allow secrets smaller than this size to be shared. In contrast to Shamir's original secret sharing description, which provides information theoretic security, our variant (due to the inclusion of an integrated checksum) is not designed for sharing low-entropy secrets and may become insecure under these circumstances. Also this specification MUST NOT be used as basis for an implementation for secret sharing of arbitrary data. It is instead intended to share cryptographic secrets with a minimum of 128128 bits of entropy. If against this rule, a very low entropy secret were to be shared, the inclusion of a 6464 bit checksum may allow an attacker with possession of t1t - 1 shares to brute-force the shared secret, using the checksum as a means to verify if a guess of the secret is correct.

To share arbitrary data, it is RECOMMENDED to encrypt the data using a suitable (symmetric) encryption algorithm and only share the (symmetric) encryption key using the method specified in this document.

Checksum

The specification defines the secret sharing process on the level of bytes. It is intended to be used with encoding schemes such as BIP39 or Bytewords to encode/decode the in- and outputs. These encoding schemes are already designed to add additional safeguards to detect errors, e.g., they include a dedicated checksum and represent bytes as words with a minimum required Damerau-Levenshtein distance etc. Therefore, no additional checksums and metadata is appended to the raw data output by the secret sharing algorithm in this specification.

Unfortunately, these encoding-level checksums used in mnemonics do not allow for error detection in cases where syntactically correct, but semantically invalid, shares are provided to the recovery process. These cases include, the following scenarios:

In both cases the recovery process would fail silently resulting in a recovered secret that is different to the secret originally shared, if no additional countermeasures are implemented. It is highly desirable to detect, and potentially be able to recover from, these scenarios. Therefore, this specification includes a checksum mechanism, by deriving the last coefficient of the secret sharing polynomial in a specific way. While there are many different approaches to cover these scenarios, the approach used in this specification achieves a range of desirable properties while minimizing potential drawbacks:

A theoretical drawback is the loss of information theoretical security as the value of ct1c_{t - 1} includes a 6464 bit checksum, derived cryptographically from (i) the secret to shared and (ii) at least 64 additional bits of entropy.

Compatibility and Separation of Concerns

The specification is intentionally designed such that it can be used with different mnemonic schemes and different key derivation schemes building on top of the shared cryptographic keying material. This approach follows the separation of concerns principle as generation, derivation, mnemonic encoding/decoding and secret sharing of cryptographic key material are separate tasks and can thus be defined and implemented separately (see Section Scope for more details). As long as the intermediate formats are well defined, this loose coupling allows for easier replacement and updates of individual components in software stacks and tool chains. The document at hand specifies a method for secret sharing of cryptographic keying material.

Sharing the same secret multiple times

The specified secret sharing scheme in this document MAY safely be invoked multiple times with the same secret as input. The resulting shares for each invocation are different and independent from the shares of the other invocations (for the special of case t=1t = 1 all generated shares are equal to the secret being shared). Shares from different secret sharing instances cannot (and are by design not supposed to) be combined to recover a secret shared this way.

Even in the worst case, i.e., the shortest possible secret (1616 bytes) with the recovery threshold of t=2t = 2, the probability of picking the same secret sharing polynomial (in this case only defined by the secret and 6464 random bits) is low for reasonable numbers of invocations. The probability can be computed via the formula of the birthday problem given below, using qq to denote the number of secret sharing invocations:

1q!(264q)(264)q. 1 - \frac{q! \cdot \binom{2^{64}}{q}}{\left(2^{64}\right)^q}.

For q=100q = 100 for example, this results in a probability of 2.681016\approx 2.68 \cdot 10^{-16}.

Multi layer / Recursive use

The specification implicitly covers the use case of sharing the same secret among different (independent) groups of stakeholders and/or further sharing shares of a shared secret.

We consider this as an advanced use case, and in contrast to, e.g., the SLIP39 proposal, deliberately do not complicate this specification by introducing an additional notation for groups to explicitly capture such use cases. Yet, our specification transparently allows to re-share the shares for a given secret in a natural way: This is simply accomplished by feeding a share, which should be further split up, as secret into the secret sharing procedure. While special care must be taken to ensure that the way secrets are shared and may later be recombined is noted, our approach in principle allows for an arbitrary number of nested secret sharing layers.

The use of GF(28)\mathbf{GF(2^8)}

This design performs all secret sharing and recovery computations in the finite field GF(28)\text{GF}(2^8). This allows the specified algorithms to process the data on a byte-per-byte level and simplifies implementations in programming languages or on hardware platform with no immediate support for arbitrary precision integer arithmetics. Using this approach, share indices are restricted to integers from 11 to 255255. Thus the maximum number of shares is limited to 255255. However in practice, we only see this as minor limitation for the indented use case of sharing mnemonic phrases, where human interaction is involved.

Test vectors

The files test_vectors.json and test_vectors.py in ./src directory of the Github repository of this specification contain a list of secret sharing computation performed with the reference implementation. An example test vector is given below:

{ "s": "243f6a8885a308d313198a2e03707344a4093822299f31d0082efa98ec4e6c89", "n": 5, "t": 3, "c": [ "243f6a8885a308d313198a2e03707344a4093822299f31d0082efa98ec4e6c89", "b7e151628aed2a6abf7158809cf4f3c762e7160f38b4da56a784d9045190cfef", "324e7738926cfbe5f4bf8d8d8c31d763da06c80abb1185eb676dea5f8d95cb78" ], "shares": [ [1, "a1904cd29d22d95c58d75f2313b557e01ce8e627aa3a6e6dc8c7c9c3304b681e"], [2, "99c50facf4c99dbe8b3138372647ff4625c4191483a8bcfdda92d6f74c17e8b7"], [3, "1c6a29f6ec484c31c0ffed3a3682dbe29d25c711000de3401a7be5ac9012ec20"], [4, "c31a04b678a089b200c3f9105db04d1f38d854be8c72fca18882910fbbab79d9"], [5, "46b522ec6021583d4b0d2c1d4d7569bb80398abb0fd7a31c486ba25467ae7d4e"] ] },

The test vectors include secret sharing examples for all values of 1tn51 \leq t \leq n \leq 5. As the secret sharing process is inherently randomized, the values of the coefficients used for the secret sharing process are given in the test vectors. This allows other implementations to check if the secret sharing leads to the expected shares for the given coefficients. The last 8 bytes of the coefficients in the test vectors are computed according to this specification and can be used to check the correctness of the checksum computation. Implementations MUST NOT use the provided values for purposes other than testing the correctness of the implementation.

Implementations

All implementation MUST support the following functionality as specified later in this document:

  1. Sharing
  2. Recovery
    1. Checksum Verification

Implementations MAY support additionally support functionality to aid the recovery process in the following circumstances given as non-exhaustive list below:

  1. Recovery from a set of vtv \leq t valid shares, even if the underlying secret sharing threshold tt is unknown.
  2. Recovery from a set of shares, given that at least tt valid shares are provided.
  3. Recovery in case share indices are missing or invalid.
  4. Combinations of the above issues.

Depending on the number of shares nn and the secret sharing threshold tt used, in particular for small values, recovery under these (and other) circumstances is possible with high probability due to the included checksum.

Feel free to contact us if you have also implemented the specification and want your implementation to be listed on this page.

Reference implementation

The reference implementation can be found in the ./src directory of the Github repository of this specification.

BIP39 Toolkit

An implementation which uses SSSMP as a component is the BIP39 Toolkit. The BIP39 toolkit was developed alongside this specification and provides a standalone Python implementation with a CLI-based interface using BIP39 as input and output format. Moreover it supports the generation of cryptographic keys from scratch.

References

Version History

2021-05-07