Skip to content

Commit 0fcd20d

Browse files
committed
Project commit
1 parent d000480 commit 0fcd20d

File tree

28 files changed

+2913
-3
lines changed

28 files changed

+2913
-3
lines changed

.swift-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.0
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
//: [Previous](@previous)
2+
//: ## Character Bases
3+
//:
4+
//: As we've seen in the previous sections, `EntropyString` provides default characters for each of
5+
//: the supported bases. Let's see what's under the hood.
6+
import EntropyString
7+
8+
print("Base 64: \(RandomString.characters(for: .base64))\n")
9+
//: The call to `RandomString.characters(for:)` returns the characters used for any of the
10+
//: bases defined by the `RandomString.CharBase enum`. The following code reveals all the
11+
//: character bases.
12+
print("Base 32: \(RandomString.characters(for: .base32))\n")
13+
print("Base 16: \(RandomString.characters(for: .base16))\n")
14+
print("Base 8: \(RandomString.characters(for: .base8))\n")
15+
print("Base 4: \(RandomString.characters(for: .base4))\n")
16+
print("Base 2: \(RandomString.characters(for: .base2))\n")
17+
//: The default character bases were chosen as follows:
18+
//: - Base 64: **ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_**
19+
//: - The file system and URL safe char set from
20+
//: [RFC 4648](https://tools.ietf.org/html/rfc4648#section-5).
21+
//: - Base 32: **2346789bdfghjmnpqrtBDFGHJLMNPQRT**
22+
//: * Remove all upper and lower case vowels (including y)
23+
//: * Remove all numbers that look like letters
24+
//: * Remove all letters that look like numbers
25+
//: * Remove all letters that have poor distinction between upper and lower case values.
26+
//: * The resulting strings don't look like English words and are easy to parse visually.
27+
//: - Base 16: **0123456789abcdef**
28+
//: - Hexadecimal
29+
//: - Base 8: **01234567**
30+
//: - Octal
31+
//: - Base 4: **ATCG**
32+
//: - DNA alphabet. No good reason; just wanted to get away from the obvious.
33+
//: - Base 2: **01**
34+
//: - Binary
35+
//:
36+
//: You may, of course, want to choose the characters used, which is covered next in [Custom
37+
//: Characters](Custom%20Characters).
38+
//:
39+
//: [TOC](Table%20of%20Contents) | [Next](@next)
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
//: [Previous](@previous)
2+
//: ## Custom Bytes
3+
//:
4+
//: As described in [Secure Bytes](Secure%20Bytes), `EntropyString` automatically generates random
5+
//: bytes using either `SecRandomCopyBuf` or `arc4random_buf`. These functions are fine, but you
6+
//: may have a need to provide your own btyes, say for deterministic testing or to use a
7+
//: specialized byte genterator. The `RandomString.entropy(of:using:bytes)` function allows
8+
//: passing in your own bytes to create a string.
9+
import EntropyString
10+
11+
let bytes: RandomString.Bytes = [250, 200, 150, 100]
12+
let string = try! RandomString.entropy(of: 30, using: .base32, bytes: bytes)
13+
print("String: \(string)\n")
14+
//: * callout(string): Th7fjL
15+
//:
16+
//: The __bytes__ provided can come from any source. However, the number of bytes must be
17+
//: sufficient to generate the string as described in the [Efficiency](Efficiency) section.
18+
//: `RandomString.entropy(of:using:bytes)` throws `RandomString.RandomError.tooFewBytes` if
19+
//: the string cannot be formed from the passed bytes.
20+
do {
21+
try RandomString.entropy(of: 32, using: .base32, bytes: bytes)
22+
}
23+
catch {
24+
print(error)
25+
}
26+
//: * callout(error): tooFewBytes
27+
//:
28+
//: [TOC](Table%20of%20Contents)
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
//: [Previous](@previous)
2+
//: ## Custom Characters
3+
//:
4+
//: Being able to easily generate random strings is great, but what if you want to specify your
5+
//: own characters. For example, suppose you want to visualize flipping a coin to produce entropy
6+
//: of 10 bits.
7+
import EntropyString
8+
9+
let randomString = RandomString()
10+
var flips = randomString.entropy(of: 10, using: .base2)
11+
print("flips: \(flips)\n")
12+
//: * callout(flips): 0101001110
13+
//:
14+
//: The resulting string of __0__'s and __1__'s doesn't look quite right. You want to use the
15+
//: characters __H__ and __T__ instead.
16+
try! randomString.use("HT", for: .base2)
17+
flips = randomString.entropy(of: 10, using: .base2)
18+
print("flips: \(flips)\n")
19+
//: * callout(flips): HTTTHHTTHH
20+
//:
21+
//: Note that setting custom characters in the above code requires using an *instance* of
22+
//: `RandomString`, wheras in the previous sections we used *class* functions for all calls. The
23+
//: function signatures are the same in each case, but you can't change the static character bases
24+
//: used in the class `RandomString` (i.e., there is no `RandomString.use(_,for:)` function).
25+
//:
26+
//: As another example, we saw in [Character Bases](Character%20Bases) the default characters for
27+
//: base 16 are **01234567890abcdef**. Suppose you like uppercase hexadecimal letters instead.
28+
try! randomString.use("0123456789ABCDEF", for: .base16)
29+
let hex = randomString.entropy(of: 48, using: .base16)
30+
print("hex: \(hex)\n")
31+
//: * callout(hex): 4D20D9AA862C
32+
//:
33+
//: Or suppose you want a random password with numbers, lowercase letters and special characters.
34+
try! randomString.use("1234567890abcdefghijklmnopqrstuvwxyz-=[];,./~!@#$%^&*()_+{}|:<>?", for: .base64)
35+
let password = randomString.entropy(of: 64, using: .base64)
36+
print("password: \(password)")
37+
//: * callout(password): }4?0x*$o_=w
38+
//:
39+
//: Note that `randomString.use(_,for:)` can throw an `Error`. The throw is actually a
40+
//: `RandomStringError` and will occur if the number of characters doesn't match the number
41+
//: required for the base or if the characters are not all unique. The section on [Unique
42+
//: Characters](Unique%20Characters) discusses these errors further.
43+
//:
44+
//: [TOC](Table%20of%20Contents) | [Next](@next)
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
//: [Previous](@previous)
2+
//: ## Efficiency
3+
//:
4+
//: To efficiently create random strings, `EntropyString` generates the necessary number of
5+
//: bytes needed for each the string and uses those bytes in a bit shifting scheme to index into
6+
//: a character base. For example, consider generating strings from the `.base32` character
7+
//: base. There are __32__ characters in the base, so an index into an array of those characters
8+
//: would be in the range `[0,31]`. Generating a random string of `.base32` characters is thus
9+
//: reduced to generating a random sequence of indices in the range `[0,31]`.
10+
//:
11+
//: To generate the indices, `EntropyString` slices just enough bits from the array of bytes to create
12+
//: each index. In the example at hand, 5 bits are needed to create an index in the range
13+
//: `[0,31]`. `EntropyString` processes the byte array 5 bits at a time to create the indices. The first
14+
//: index comes from the first 5 bits of the first byte, the second index comes from the last 3 bits of
15+
//: the first byte combined with the first 2 bits of the second byte, and so on as the byte array is
16+
//: systematically sliced to form indices into the character base. And since bit shifting and addition
17+
//: of byte values is really efficient, this scheme is quite fast.
18+
//:
19+
//: The `EntropyString` scheme is also efficient with regard to the amount of randomness used. Consider
20+
//: the following common solution to generating random strings. To generated a character, an index into
21+
//: the available characters is create using `arc4random_uniform`. The code looks something like:
22+
//:
23+
//: for _ in 0..<len {
24+
//: let offset = Int(arc4random_uniform(charCount))
25+
//: let index = chars.index(chars.startIndex, offsetBy: offset)
26+
//: let char = chars[index]
27+
//: string += String(char)
28+
//: }
29+
//:
30+
//: `arc4random_uniform` generates 32 bits of randomness, returned as an UInt32. The returned value is
31+
//: used to create an **index**. Suppose we're creating strings of **len** 16 using a **charCount**
32+
//: of 32. Each **char** consumes 32 bits of randomness (generated by `archrandom_uniform` per
33+
//: character) while only injecting 5 bits of entropy into **string**. But a string of length 16 using
34+
//: 32 possible characters has an entropy carrying capacity of 80 bits. So creating each **string**
35+
//: requires a total of 512 bits of randomness while only actually carrying 80 bits of that entropy
36+
//: forward in the string itself. That means 432 bits (84% of the total) of the generated randomness is
37+
//: simply thrown away.
38+
//:
39+
//: Compare that to the `EntropyString` scheme. For the example above, slicing off 5 bits at a time
40+
//: requires a total of 80 bits (10 bytes). Creating the same strings as above, `EntropyString` uses 80
41+
//: bits of randomness per string with no wasted bits. In general, the `EntropyString` scheme can waste
42+
//: up to 7 bits per string, but that's the worst case scenario and that's *per string*, not *per
43+
//: character*!
44+
//:
45+
//: Fortunately you don't need to really understand how the bytes are efficiently sliced and diced to get
46+
//: the string. But you may want to know that [Secure Bytes](#SecureBytes) are used, and that's the next
47+
//: topic.
48+
//:
49+
//: [TOC](Table%20of%20Contents) | [Next](@next)
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
//: [Previous](@previous)
2+
//: ## More Examples
3+
//:
4+
//: In [Real Need](Real%20Need) our developer used hexadecimal characters for the strings.
5+
//: Let's look at using other characters instead.
6+
//:
7+
//: We'll start with using 32 characters. What 32 characters, you ask? Well, the [Character
8+
//: Bases](Character%20Bases) section discusses the default characters available in `EntropyString`
9+
//: and the [Custom Characters](Custom%20Characters) section describes how you can use whatever
10+
//: characters you want. For now we'll stick to the provided defaults.
11+
import EntropyString
12+
13+
var bits = Entropy.bits(total: 10000, risk: .ten06)
14+
var string = RandomString.entropy(of: bits, using: .base32)
15+
print("String: \(string)\n")
16+
//: * callout(string): PmgMJrdp9h
17+
//:
18+
//: We're using the same __bits__ calculation since we haven't changed the number of IDs or the
19+
//: accepted risk of probabilistic uniqueness. But this time we use 32 characters and our resulting
20+
//: ID only requires 10 characters (and can carry 50 bits of entropy, which as when we used 16
21+
//: characters, is more than the required 45.51).
22+
//:
23+
//: Now let's suppose we need to ensure the names of a handful of items are unique. Let's say 30
24+
//: items. And let's decide we can live with a 1 in 100,000 probability of collision (we're just
25+
//: futzing with some code ideas). Using hex characters we get:
26+
bits = Entropy.bits(total: 30, risk: .ten05)
27+
string = RandomString.entropy(of: bits, using: .base16)
28+
print("String: \(string)\n")
29+
//: * callout(string): 766923a
30+
//:
31+
//: Using base 4 characters we get:
32+
string = RandomString.entropy(of: bits, using: .base4)
33+
print("String: \(string)\n")
34+
//: * callout(string): GCGTCGGGTTTTA
35+
//:
36+
//: Okay, we probably wouldn't use base 4 (and what's up with those characters?), but you get the
37+
//: idea.
38+
//:
39+
//: Suppose we have a more extreme need. We want less than a 1 in a trillion chance that 10
40+
//: billion strings of 32 characters repeat. Let's see, our risk (trillion) is 10 to the 12th and
41+
//: our total (10 billion) is 10 to the 10th, so:
42+
//:
43+
bits = Entropy.bits(total: .ten10, risk: .ten12)
44+
string = RandomString.entropy(of: bits, using: .base32)
45+
print("String: \(string)\n")
46+
//: * callout(string): F78PmfGRNfJrhHGTqpt6Hn
47+
//:
48+
//: Finally, let say we're generating session IDs. We're not interested in uniqueness per se, but in
49+
//: ensuring our IDs aren't predicatable since we can't have the bad guys guessing a valid ID. In
50+
//: this case, we're using entropy as a measure of unpredictability of the IDs. Rather than calculate
51+
//: our entropy, we declare it needs to be 128 bits (since we read on some web site that session IDs
52+
//: should be 128 bits).
53+
string = RandomString.entropy(of: 128, using: .base64)
54+
print("String: \(string)\n")
55+
//: * callout(string): b0Gnh6H5cKCjWrCLwKoeuN
56+
//:
57+
//: Using 64 characters, our string length is 22 characters. That's actually 132 bits, so we've got
58+
//: our OWASP requirement covered! 😌
59+
//:
60+
//: Also note that we covered our need using strings that are only 22 characters in length. So long
61+
//: to using GUID strings which only carry 122 bits of entropy (for the commonly used version 4
62+
//: anyway) and use string representations (hex and dashes) that are 36 characters in length.
63+
//:
64+
//: [TOC](Table%20of%20Contents) | [Next](@next)
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
//: [Previous](@previous)
2+
//:
3+
//: ## Overview
4+
//:
5+
//: `EntropyString` provides easy creation of randomly generated strings of specific entropy using
6+
//: various character bases. Such strings are needed when generating, for example, random IDs and
7+
//: you don't want the overkill of a GUID, or for ensuring that some number of items have unique
8+
//: names.
9+
//:
10+
//: A key concern when generating such strings is that they be unique. To truly guarantee uniqueness
11+
//: requires that each newly created string be compared against all existing strings. The overhead
12+
//: of storing and comparing strings in this manner is often too onerous and a different strategy is
13+
//: desired.
14+
//:
15+
//: A common strategy is to replace the *guarantee of uniqueness* with a weaker but hopefully
16+
//: sufficient *probabilistic uniqueness*. Specifically, rather than being absolutely sure of
17+
//: uniqueness, we settle for a statement such as *"there is less than a 1 in a billion chance that
18+
//: two of my strings are the same"*. This strategy requires much less overhead, but does require
19+
//: we have some manner of qualifying what we mean by, for example, *"there is less than a 1 in a
20+
//: billion chance that 1 million strings of this form will have a repeat"*.
21+
//:
22+
//: Understanding probabilistic uniqueness requires some understanding of
23+
//: [*entropy*](https://en.wikipedia.org/wiki/Entropy_(information_theory)) and of estimating the
24+
//: probability of a
25+
//: [*collision*](https://en.wikipedia.org/wiki/Birthday_problem#Cast_as_a_collision_problem) (i.e.,
26+
//: the probability that two strings in a set of randomly generated strings might be the same).
27+
//: Happily, you can use `EntropyString` without a deep understanding of these topics.
28+
//:
29+
//: We'll begin investigating `EntropyString` by considering our [Real Need](Real%20Need) when
30+
//: generating random strings.
31+
//:
32+
//: [TOC](Table%20of%20Contents) | [Next](@next)
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
//: [Previous](@previous)
2+
//: ## Real Need
3+
//:
4+
//: Let's start by reflecting on a common developer statement of need:
5+
//:
6+
//: *I need random strings 16 characters long.*
7+
//:
8+
//: Okay. There are libraries available that address that exact need. But first, there are some
9+
//: questions that arise from the need as stated, such as:
10+
//:
11+
//: 1. What characters do you want to use?
12+
//: 2. How many of these strings do you need?
13+
//: 3. Why do you need these strings?
14+
//:
15+
//: The available libraries often let you specify the characters to use. So we can assume for now
16+
//: that question 1 is answered with:
17+
//:
18+
//: *Hexadecimal IDs will do fine*.
19+
//:
20+
//: As for question 2, the developer might respond:
21+
//:
22+
//: *I need 10,000 of these things*.
23+
//:
24+
//: Ah, now we're getting somewhere. The answer to question 3 might lead to the further qualification:
25+
//:
26+
//: *I need to generate 10,000 random, unique IDs*.
27+
//:
28+
//: And the cat's out of the bag. We're getting at the real need, and it's not the same as the original
29+
//: statement. The developer needs *uniqueness* across a total of some number of strings. The length of
30+
//: the string is a by-product of the uniqueness, not the goal.
31+
//:
32+
//: As noted in the [Overview](Overview), guaranteeing uniqueness is difficult, so we'll replace that
33+
//: declaration with one of *probabilistic uniqueness* by asking:
34+
//:
35+
//: 4. What risk of a repeat are you willing to accept?
36+
//:
37+
//: Probabilistic uniqueness contains risk. That's the price we pay for giving up on the stronger
38+
//: declaration of strict uniqueness. But the developer can quantify an appropriate risk for a
39+
//: particular scenario with a statement like:
40+
//:
41+
//: *I guess I can live with a 1 in a million chance of a repeat*.
42+
//:
43+
//: So now we've gotten to the real need:
44+
//:
45+
//: *I need 10,000 random hexadecimal IDs with less than 1 in a million chance of any repeats*.
46+
//:
47+
//: How do you address this need using a library designed to generate strings of specified length?
48+
//: Well, you don't directly, because that library was designed to answer the originally stated need,
49+
//: not the real need we've uncovered. We need a library that deals with probabilistic uniqueness
50+
//: of a total number of some strings. And that's exactly what `EntropyString` does.
51+
//:
52+
//: Let's use `EntropyString` to help this developer:
53+
import EntropyString
54+
55+
let bits = Entropy.bits(total: 10000, risk: .ten06)
56+
var strings = [String]()
57+
for i in 0 ..< 5 {
58+
let string = RandomString.entropy(of: bits, using: .base16)
59+
strings.append(string)
60+
}
61+
print("Strings: \(strings)")
62+
//: * callout(strings): ["85e442fa0e83", "a74dc126af1e", "368cd13b1f6e", "81bf94e1278d", "fe7dec099ac9"]
63+
//:
64+
//: To generate the IDs, we first use
65+
//:
66+
//: ```swift
67+
//: let bits = Entropy.bits(total: 10000, risk: .ten06)
68+
//: ```
69+
//:
70+
//: to determine the bits of entropy needed to satisfy our probabilistic uniqueness of **10,000**
71+
//: strings with a **1 in a million** (ten to the sixth power) risk of repeat. We didn't print the
72+
//: result, but if you did you'd see it's about **45.51**. Then inside a loop we used
73+
//:
74+
//: ```swift
75+
//: let string = RandomString.entropy(of: bits, using: .base16)
76+
//: ```
77+
//:
78+
//: to actually generate random strings using hexadecimal (base16) characters. Looking at the IDs, we can
79+
//: see each is 12 characters long. Again, the string length is a by-product of the characters used to
80+
//: represent the entropy we needed. And it seems the developer didn't really need 16 characters after all.
81+
//:
82+
//: Finally, given that the strings are 12 hexadecimals long, each string actually has an
83+
//: information carrying capacity of 12 * 4 = 48 bits of entropy (a hexadecimal character carries 4
84+
//: bits). That's fine. Assuming all characters are equally probable, a string can only carry entropy
85+
//: equal to a multiple of the amount of entropy represented per character. `EntropyString` produces
86+
//: the smallest strings that *exceed* the specified entropy.
87+
//:
88+
//: [TOC](Table%20of%20Contents) | [Next](@next)
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
//: [Previous](@previous)
2+
//: ## Secure Bytes
3+
//:
4+
//: As described in [Efficiency](Efficiency), `EntropyString` uses an underlying array of
5+
//: bytes to generate strings. The entropy of the resulting strings is, of course, directly
6+
//: tied to the randomness of the bytes used. That's an important point. Strings are only capable
7+
//: of carrying information (entropy), it's the random bytes that actually provide the entropy
8+
//: itself.
9+
//:
10+
//: `EntropyString` automatically generates the necessary number of bytes needed for the
11+
//: strings using either `SecRandomCopyBytes` or `arc4random_buf`, both of which produce
12+
//: cryptographically-secure random byte. `SecRandomCopyBytes` is the stronger of the two,
13+
//: but can fail. Rather than propagate that failure, if `SecRandomCopyBytes` fails
14+
//: `EntropyString` falls back and uses`arc4random_buf` to generate the bytes. Though not as
15+
//: secure, `arc4random_buf` does not fail.
16+
//:
17+
//: You may, however, want to know which routine was used to generate the underlying bytes for a
18+
//: string. `RandomString` provides an additional `inout` parameter in the
19+
//: `RandomString.entropy(for:using:secure)` function for this purpose.
20+
import EntropyString
21+
22+
var secure = true
23+
RandomString.entropy(of: 20, using: .base32, secure: &secure)
24+
print("secure: \(secure)")
25+
//: * callout(secure): true
26+
//:
27+
//: If `SecRandomCopyBytes` is used, the __secure__ parameter will remain `true`; otherwise it
28+
//: will be flipped to `false`.
29+
//:
30+
//: You can also pass in __secure__ as `false`, in which case the `entropy` call will not
31+
//: attempt to use `SecRandomCopyBytes` and will use `arc4random_buf` instead.
32+
secure = false
33+
RandomString.entropy(of: 20, using: .base32, secure: &secure)
34+
//: Rather than have `EntropyString` generate bytes automatically, you can provide your own [Custom
35+
//: Bytes](Custom%20Bytes) to create a string, which is the next topic.
36+
//:
37+
//: [TOC](Table%20of%20Contents) | [Next](@next)

0 commit comments

Comments
 (0)