Using the Normalize Diacritic Characters Activity

May 11, 2009 at 10:13 AMHenrik Nilsson

I got a comment from Joe Stepongzi today and he didn’t like my Normalize Diacritic Characters Activity that is a part of my Cortego ILM 2 Workflow Activity Library:

I am not sure I like the Normalize Diacritic Characters Activity..
As certain values could be changed to multiple characters instead of one..
I think email addresses should be done at the source and not handled in ILM "2"

The use of the Normalize Diacritic Characters Activity is to normalize characters with different kinds of diacritics into pure characters or how I should define it? The main reason I've created this activity is that I'm from Sweden and must handle "ÅÄÖ" but I'm also working for a company that has a lot of employees in the eastern European countries and that is a nightmare when trying to create for example email addresses. This could be hard to understand for Britain’s and Americans since English is a language where diacritics are sparsely used and this wouldn't have been a problem if the Americans would have understood from the beginning there are other languages than English and a need for other standards than ASCII. Here are a couple of examples of what could be accomplished (I do hope your browser supports Unicode otherwise you'll probably see a lot of boxes):

As you see the activity is only normalizing diacritics by removing any Unicode spacing marks and this is how it works code wise using the System.Globalization namespace for normalization of diacritics:


public static string NormalizeDiacriticChars(string input)
{
   string formD = input.Normalize(NormalizationForm.FormD);
   StringBuilder sb = new StringBuilder();
   for (int i = 0; i < formD.Length; i++)
   {
      UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(formD[i]);
      if (uc != UnicodeCategory.NonSpacingMark)
      {
         sb.Append(formD[i]);
      }
   }
   return (sb.ToString().Normalize(NormalizationForm.FormC));
}

First of all the input string is normalized into Form D that decomposes characters in this way:

  • å –> aRing
  • Ё –> E + Umlaut
  • æ –> a + e (Used in Danish, Norwegian and old English more)
  • –>  ++ (Hangul letter used in Korea)

Then all characters defined as Unicode spacing marks are removed and in the example above the ring and the dots (umlaut) are removed. Finally the remaining string is normalized into Form C, composing characters back, for example:

  • a -> a (The ring is already removed)
  • E -> E (The umlaut is already removed)
  • a + e –> æ (Note: if the original input would have been “ae” it would not become “æ”)
  • + + –>

Normalizing a eastern European name like "Lāčkāja Lapiņš" would end up as "Lackaja Lapins" and a typical Swedish name like "Åsa Öberg" would end up as "Asa Oberg", a lot easier to handle for creating different kind of names and also widely accepted in the countries where diacritic characters are used.

As you can see, characters are not as Joe thought changed into multiple characters but he do have a point in that for example email addresses should be handled at the source and not in ILM2/FIM2010... But if you would like accounts and mailboxes to be automatically created from for example an HR system, one of the best practices of Identity Management... You might be forced to create the email addresses and other system names following your naming standards unless you trust your HR personnel having full control over all existing email addresses and names. It’s up to you to make sure input characters are valid but by using this activity you don’t have to worry about macrons, curls, dots, accents and so on but as you can see the  and æ characters is not changed or removed so they would still a be problem when creating email addresses.

A solution to make sure you get valid strings after normalization could be to use my Regex Replace Activity to remove or replace any remaining characters that isn’t valid in the context you’re using it. In order to get unique names or email addresses you could use my Unique Name Activity. Both these activities is contained in the Cortego ILM 2 Workflow Activity Library. The pattern "[^a-zA-Z0-9\s]" could be used in the Regex Replace Activity to find and remove or replace all characters that is not within a-z, A-Z, 0-9 and whitespace characters.  

If you would like to know more about Unicode Normalization this is a great guide: Unicode Normalization Forms. If you would like to know how different characters from different scripts including Cyrillic, Greek, Latin, Thai, Katakana, and so on are composed/decomposed you could have a look at these Normalization Charts. A description of different kinds of diacritics could be found at Diacritic - Wikipedia.

Finally, do you trust your HR personnel or do you have a Catbert at your company? Laughing

Posted in: Workflow | Forefront Identity Manager

Tags: , , , , ,

Cool feature using the RegexReplaceActivity

April 30, 2009 at 1:28 PMHenrik Nilsson

The RegexReplaceActivity that is introduced in the Cortego ILM 2 Workflow Activity Library is using the Regex class of System.Text.RegularExpressions namespace and by using the Replacement parameter of the Replace function we could actually do some real cool stuff. The Replacement parameter of the Replace function is translated into the Replacement property of the RegexReplaceActivity and there is no requirement the Replacement parameter must contain a plain text, it could in fact contain a replacement pattern as well and here is an example taken from the MSDN - Regular Expressions Examples used to change the format of dates. Please notice it's just an example, you're the one that must know how actual values are formatted and I don't know if using the EmployeeEndDate attribute with this example is appropriate.

Replace dates of the form mm/dd/yy with dates of the form dd-mm-yy.

Input value (from Expression): 04/30/09 or 04/30/2009 (there's a 2 to 4 characters quantifier for year in the Regex Pattern)
RegEx Pattern: \b(?<month>\d{1,2})/(?<day>\d{1,2})/(?<year>\d{2,4})\b
Replacement: ${day}-${month}-${year} 

Regex Replace MDYToDMY  

Output value (Destination expression): 30-04-09 or 30-04-2009 – isn’t that cooljQuery15207980085869857615_1318365216111?
What happens is that the input data is captured into variables that are then used to format a new value.

Realize what you could do with this, you could in fact simply extract parts from or format input data to what ever you like!
A good source for more info about regular Expressions is .NET Framework Regular Expressions.

Posted in: Forefront Identity Manager | Workflow

Tags: , , ,

Cortego ILM 2 Workflow Activity Library

April 8, 2009 at 4:19 PMHenrik Nilsson

After a lot of work I’m confident these workflow activities work pretty satisfying therefore I’ve decided to release them to the public but without any guarantees. I wish to send my thanks to Brad Turner and the others at Ensynch that made the great walkthrough in making custom ILM2 activities - http://www.codeplex.com/ILM2WFActivity and to Mark Gabarra that made a video on the subject before he left Microsoft (sad!) - display-name-generation-activity-a-custom-ilm2-action-activity.

The Expression and Destination fields are common for almost all activities except the password generator activity that only have a destination and the Unique name activity that takes more than one expression that are evaluated one at a time. The expression field can take more than one input value and even string values so for example “[//Target/LastName], [//Target/FirstName]” is ok. The destination field only takes a single output of either the “[//WorkflowData/…]” or “[//Target/…]” types.

Update Value Activity

This is the simplest activity in the library, it takes any input and writes it to either the WorkflowDictionary - [//WorkflowData/…] or to a target attribute – [//Target/…]. The main usage for this activity is to write a value created by the Function activity that in RC0 only have the workflow dictionary as working destination.

UpdateValueActivity

Activity information configuration

Display Name Cortego Update Value Activity
Description Updates a Target value from an Expression
Activity Name Cortego.ILM.Workflow.Activities.UpdateValueActivity
Assembly Name Cortego.ILM.Workflow.Activities, Version=1.0.0.0, Culture=neutral, PublicKeyToken=b88d7150cfc8f36b
Authentication, Action, Authorization Your choice.
Type Name Cortego.ILM.Workflow.Activities.UpdateValueActivitySettingsPart

Normalize Diacritic Characters Activity

This activity is almost the same as the Update Value Activity except it normalizes diacritic characters, for example ÄÖÅÜčȭ becomes AOAUco and this very useful for writing email addresses that can’t contain diacritics. Read more about diacritic characters at http://en.wikipedia.org/wiki/Diacritic.

NormalizeDiacritics

Activity information configuration

Display Name Cortego Normalize Diacritic Characters Activity
Description Normalizes Diacritic Characters like ÅÄÖ to AAO.
Activity Name Cortego.ILM.Workflow.Activities.NormalizeDiacriticCharactersActivity
Assembly Name Cortego.ILM.Workflow.Activities, Version=1.0.0.0, Culture=neutral, PublicKeyToken=b88d7150cfc8f36b
Authentication, Action, Authorization Your choice.
Type Name Cortego.ILM.Workflow.Activities.NormalizeDiacriticCharactersActivitySettingsPart

Regex Replace Activity

This is almost the same as the Update Value Activity as well except it takes a Regular Expression Pattern and an optional replacement value that could be used for removing or replacing invalid characters from attribute values. A good example of this is the Active Directory sAMAccountName attribute that doesn’t support /\[]:;|=,+*?<>@ the regular expression for this would be… “[/:;\|=,\+\*\?<>@\[\]\\]”. If you’re not familiar with Regular Expressions, have a look at http://msdn.microsoft.com/en-us/library/hs600312(VS.71).aspx. The replacement value is used if you wish to replace characters with something else but just leave it empty for removing characters.

RegexreplaceActivity

Activity information configuration

Display Name Cortego Regex Replace Activity
Description Uses a Regular Expression to do string replacements.
Activity Name Cortego.ILM.Workflow.Activities.RegexReplaceActivity
Assembly Name Cortego.ILM.Workflow.Activities, Version=1.0.0.0, Culture=neutral, PublicKeyToken=b88d7150cfc8f36b
Authentication, Action, Authorization Your choice.
Type Name Cortego.ILM.Workflow.Activities.RegexReplaceActivitySettingsPart

Generate Password Activity

This activity generates a strong password with at least one character from each category, upper case characters (A-Z), lower case characters (a-z), numeric characters (0-9) and special characters (!#%&/()=?-:;><@$,._*). It’s recommended that password values are written to a custom target attribute (hidden from UI) instead of directly with an outbound sync rule since the password in that case will end up fully readable in the Expected Rules Entry. Remember that passwords generated with this activity is hard to remember and only suitable as temporary passwords before the users can set it’s own, we don’t want to end up with passwords on paper notes under the keyboard.

GeneratePasswordActivity

Activity information configuration

Display Name Cortego Password Generator Activity
Description Generates strong passwords.
Activity Name Cortego.ILM.Workflow.Activities.PasswordGeneratorActivity
Assembly Name Cortego.ILM.Workflow.Activities, Version=1.0.0.0, Culture=neutral, PublicKeyToken=b88d7150cfc8f36b
Authentication, Action, Authorization Your choice.
Type Name Cortego.ILM.Workflow.Activities.PasswordGeneratorActivitySettingsPart

Unique Name Activity

This is the most advanced activity in the library, it works almost the same as the Update Value Activity but there are two main differences, it takes any number of input expressions and the expressions are evaluated against an LDAP catalog from top to bottom and as soon as a unique value is found it’s written to the destination. It currently doesn’t support LDAPS and it has only been tested against Active Directory.

UniqueNameActivity

Activity information configuration

Display Name Cortego Unique Name Activity
Description Generates or takes value before it’s checked for uniqueness against LDAP catalog.
Activity Name Cortego.ILM.Workflow.Activities.UniqueNameActivity
Assembly Name Cortego.ILM.Workflow.Activities, Version=1.0.0.0, Culture=neutral, PublicKeyToken=b88d7150cfc8f36b
Authentication, Action, Authorization Your choice.
Type Name Cortego.ILM.Workflow.Activities.UniqueNameActivitySettingsPart

As you can see from my previous blog post I’ve removed the normalize diacritics and regex remove functionality and put those functions as separate activities and it’s easy to chain activities, just write your output value (destination) from any activity including the Function activity that comes with ILM2 to the workflow dictionary and use that value as input (expression) value in the next activity.

LDAP Search Activity

This activity doesn’t have any user interface so it can’t be used directly within ILM2 but it’s included in the Unique Name activity. The reason I’ve chosen not to add a UI for it is because it returns a nested dictionary (Dictionary<string, Dictionary<string, object>>) that could be hard to use from other activities but you could of course use it in your own custom activities or workflows.

Summing up

You may freely use the activities and the code in any way but if you use the code without major changes I want you to keep the comment in top of each code file that references my blog and my company. It would also be nice if you could give me some feedback, report any problems and tell me about other cool features that could be useful within the library. Please drop a message if you wish be noticed when changes or additions are made to the library and I already have an interesting activity that will show up within the library soon.

In order to use the code you’ll have to strong name the assembly using your own key before putting it into the GAC and if you aren’t sure how to do that and how to deploy, have a look at the very good document Brad Turner and the other guys at Ensynch published, see link in the beginning of this post.

Download Cortego ILM2 Workflow Activity Library Here

Posted in: Forefront Identity Manager | Workflow

Tags: , , , , ,