Again, in the interest of keeping this light, I’m just going to survey where String and Unicode-related functionality is defined in Elixir source.

Here’s a quick list of files in elixir/lib/elixir/unicode/:

  • CompositionExclusions.txt
  • GraphemeBreakProperty.txt
  • SpecialCasing.txt
  • UnicodeData.txt
  • WhiteSpace.txt
  • unicode.ex

The .txt files are the unicode data, and unicode.ex processes them into the Elixir String modules for the binary string type. You can find the those in:

  • elixir/lib/elixir/lib/string.ex
  • elixir/lib/elixir/string/chars.ex

There’s two files of tests for strings and characters:

  • elixir/lib/elixir/test/elixir/string/chars_test.exs
  • elixir/lib/elixir/test/elixir/string_test.exs

A quick digression: In string_test.exs, beginning on line 14, we’ll see this:

# test cases described in http://mortoray.com/2013/11/27/the-string-type-is-broken/
test "unicode" do
  assert String.reverse("noël") == "lëon"
  assert String.slice("noël", 0..2) == "noë"
  assert String.length("noël") == 4

  assert String.length("") == 2
  assert String.slice("", 1..1) == ""
  assert String.reverse("") == ""

  assert String.upcase("baffle") == "BAFFLE"

  assert String.equivalent?("noël", "noël")
end

Digression prime: Evidently I’m going to have to find a different font for this blog, or these posts, or these code blocks, or something.

If you’re reading this and you haven’t already read the blog post mentioned in the comment, The string type is broken, I recommend doing so. It outlines the problems with the traditional implementation of strings as a vector of characters, and provides a rationale for how strings in Elixir work, and obviously inspired these test cases. I’ll write about it in detail in later posts.

There are two types of string in Elixir, the second type is the is the “char list” which you can find implemented and tested in:

  • elixir/lib/elixir/lib/list.ex
  • elixir/lib/elixir/lib/list/chars.ex
  • elixir/lib/elixir/test/elixir/list_test.exs
  • elixir/lib/elixir/test/elixir/list/chars_test.exs

This pretty much covers the Elixir’s strings as written in Elixir. There’s “lower level” concerns in the Erlang sources:

  • elixir/lib/elixir/src/elixir_bitstring.erl
  • elixir/lib/elixir/test/erlang/string_test.erl

And I think that’s it. Next I’ll do a little more orienting through the files in elixir/lib/elixir/unicode/. Definitely check out that blog post in the meantime.