{"id":2982,"date":"2018-12-11T15:34:14","date_gmt":"2018-12-11T14:34:14","guid":{"rendered":"https:\/\/www.lieben.nu\/liebensraum\/?p=2982"},"modified":"2018-12-11T15:34:14","modified_gmt":"2018-12-11T14:34:14","slug":"removing-special-characters-from-utf8-input-for-use-in-email-addresses-or-login-names","status":"publish","type":"post","link":"https:\/\/lieben.nu\/liebensraum\/2018\/12\/removing-special-characters-from-utf8-input-for-use-in-email-addresses-or-login-names\/","title":{"rendered":"Removing special characters from UTF8 input for use in email addresses or login names"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">When working with non-US customers, users often have characters in their names like \u00eb, \u00f3, \u00e7 and so on. Most of the time, a &#8216;human process&#8217; converts these to their simple equivalent of e, o and c for use in computerized systems.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When searching for such a mapping of special characters to &#8216;safe&#8217; characters I had a hard time finding a good list or PowerShell method to automatically convert special characters to standard A-Z characters so I wrote one:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code brush: plain; notranslate\"><pre class=\"brush: powershell; title: ; notranslate\" title=\"\">\nfunction get-sanitizedUTF8Input{\n    Param(\n        &#x5B;String]$inputString\n    )\n    $replaceTable = @{\"\u00df\"=\"ss\";\"\u00e0\"=\"a\";\"\u00e1\"=\"a\";\"\u00e2\"=\"a\";\"\u00e3\"=\"a\";\"\u00e4\"=\"a\";\"\u00e5\"=\"a\";\"\u00e6\"=\"ae\";\"\u00e7\"=\"c\";\"\u00e8\"=\"e\";\"\u00e9\"=\"e\";\"\u00ea\"=\"e\";\"\u00eb\"=\"e\";\"\u00ec\"=\"i\";\"\u00ed\"=\"i\";\"\u00ee\"=\"i\";\"\u00ef\"=\"i\";\"\u00f0\"=\"d\";\"\u00f1\"=\"n\";\"\u00f2\"=\"o\";\"\u00f3\"=\"o\";\"\u00f4\"=\"o\";\"\u00f5\"=\"o\";\"\u00f6\"=\"o\";\"\u00f8\"=\"o\";\"\u00f9\"=\"u\";\"\u00fa\"=\"u\";\"\u00fb\"=\"u\";\"\u00fc\"=\"u\";\"\u00fd\"=\"y\";\"\u00fe\"=\"p\";\"\u00ff\"=\"y\"}\n\n    foreach($key in $replaceTable.Keys){\n        $inputString = $inputString -Replace($key,$replaceTable.$key)\n    }\n    $inputString = $inputString -replace '&#x5B;^a-zA-Z0-9]', ''\n    return $inputString\n}\n\n#example usage:\nget-sanitizedUTF8Input -inputString \"J\u00f6s\u00e8\"\n#result:\nJose\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">Edit: my colleague <a rel=\"noreferrer noopener\" aria-label=\"Edit: my colleague Gerbrand alerted me to a post by&nbsp; (opens in a new tab)\" href=\"https:\/\/twitter.com\/gerbrandvdweg\" target=\"_blank\">Gerbrand<\/a> alerted me to <a rel=\"noreferrer noopener\" aria-label=\"Edit: my colleague Gerbrand alerted me to a post by&nbsp;Gr\u00e9gory Schiro (opens in a new tab)\" href=\"https:\/\/social.msdn.microsoft.com\/Forums\/en-US\/cee35f60-d6ba-4857-932a-4b3ba284844b\/replacing-national-characters-in-a-string\" target=\"_blank\">a post by&nbsp;Gr\u00e9gory Schiro<\/a>&nbsp;which solves this issue much more elegantly using native .NET functions. My slightly modified version to really ensure nothing non a-zA-Z0-9 gets past the function:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code brush: plain; notranslate\"><pre class=\"brush: powershell; title: ; notranslate\" title=\"\">\nfunction Remove-DiacriticsAndSpaces\n{\n    Param(\n        &#x5B;String]$inputString\n    )\n    $objD = $inputString.Normalize(&#x5B;Text.NormalizationForm]::FormD)\n    $sb = New-Object Text.StringBuilder\n \n    for ($i = 0; $i -lt $objD.Length; $i++) {\n        $c = &#x5B;Globalization.CharUnicodeInfo]::GetUnicodeCategory($objD&#x5B;$i])\n        if($c -ne &#x5B;Globalization.UnicodeCategory]::NonSpacingMark) {\n          &#x5B;void]$sb.Append($objD&#x5B;$i])\n        }\n      }\n    \n    $sb = $sb.ToString().Normalize(&#x5B;Text.NormalizationForm]::FormC)\n    return($sb -replace '&#x5B;^a-zA-Z0-9]', '')\n}\n#example usage:\nRemove-DiacriticsAndSpaces -inputString \"J\u00f6s\u00e8\"\n#result:\nJose\n\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">And an even easier oneliner I converted to a function by <a href=\"https:\/\/twitter.com\/jseerden\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"And an even easier oneliner I converted to a function by John Seerden: (opens in a new tab)\">John Seerden<\/a>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code brush: plain; notranslate\"><pre class=\"brush: powershell; title: ; notranslate\" title=\"\">\nfunction Remove-DiacriticsAndSpaces\n{\n    Param(\n        &#x5B;String]$inputString\n    )\n    #replace diacritics\n    $sb = &#x5B;Text.Encoding]::ASCII.GetString(&#x5B;Text.Encoding]::GetEncoding(\"Cyrillic\").GetBytes($inputString))\n\n    #remove spaces and anything the above function may have missed\n    return($sb -replace '&#x5B;^a-zA-Z0-9]', '')\n}\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">And the most advanced function I&#8217;ve found so far is by&nbsp;<br><a rel=\"noreferrer noopener\" aria-label=\"And the most advanced function I've found so far is by&nbsp;\nDaniele Catanesi (PsCustomObject): https:\/\/github.com\/PsCustomObject\/New-StringConversion\/blob\/master\/New-StringConversion.ps1 (opens in a new tab)\" href=\"https:\/\/github.com\/PsCustomObject\" target=\"_blank\">Daniele Catanesi (PsCustomObject)<\/a>: <a rel=\"noreferrer noopener\" aria-label=\"And the most advanced function I've found so far is by&nbsp;\nDaniele Catanesi (PsCustomObject): https:\/\/github.com\/PsCustomObject\/New-StringConversion\/blob\/master\/New-StringConversion.ps1 (opens in a new tab)\" href=\"https:\/\/github.com\/PsCustomObject\/New-StringConversion\/blob\/master\/New-StringConversion.ps1\" target=\"_blank\">https:\/\/github.com\/PsCustomObject\/New-StringConversion\/blob\/master\/New-StringConversion.ps1<\/a>&nbsp;in which all features of above functions are supported and parameterized.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>When working with non-US customers, users often have characters in their names like \u00eb, \u00f3, \u00e7 and so on. Most of the time, a &#8216;human process&#8217; converts these to their simple equivalent of e, o and c for use in computerized systems. When searching for such a mapping of special characters to &#8216;safe&#8217; characters I &hellip; <a href=\"https:\/\/lieben.nu\/liebensraum\/2018\/12\/removing-special-characters-from-utf8-input-for-use-in-email-addresses-or-login-names\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Removing special characters from UTF8 input for use in email addresses or login names<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_crdt_document":"","footnotes":""},"categories":[4,39],"tags":[],"class_list":["post-2982","post","type-post","status-publish","format-standard","hentry","category-automation","category-powershell"],"_links":{"self":[{"href":"https:\/\/lieben.nu\/liebensraum\/wp-json\/wp\/v2\/posts\/2982","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lieben.nu\/liebensraum\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lieben.nu\/liebensraum\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lieben.nu\/liebensraum\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lieben.nu\/liebensraum\/wp-json\/wp\/v2\/comments?post=2982"}],"version-history":[{"count":0,"href":"https:\/\/lieben.nu\/liebensraum\/wp-json\/wp\/v2\/posts\/2982\/revisions"}],"wp:attachment":[{"href":"https:\/\/lieben.nu\/liebensraum\/wp-json\/wp\/v2\/media?parent=2982"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lieben.nu\/liebensraum\/wp-json\/wp\/v2\/categories?post=2982"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lieben.nu\/liebensraum\/wp-json\/wp\/v2\/tags?post=2982"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}