Codepoints above U+10FFFF and overlong encodings are considered invalid. Unpaired surrogates are not, as these are known to be generated on occasion — by Windows, for example. Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>