<! – ->

Applies to: Exchange Server 2013Applies to: Exchange Server 2013

The XML schema and instructions in this article are designed to help you get started building your own basic Data Loss Prevention (DLP) XML files, which define your own types of confidential information in a classification rule package.The XML schema and guidance in this topic wants to help you create your own basic data loss prevention (DLP) XML files that define your own sensitive information types in a classification rule package. After you have created a well-formed XML file, you can import it by using the Exchange Management Console or the Exchange Management Shell to help you create a Microsoft Exchange Server 2013 DLP solution.After you have created a well-formed XML file, you can either use it or the Exchange admin center to create a Microsoft Exchange Server 2013 DLP solution. An XML file that serves as a custom DLP policy template can contain the XML that is in your classification rule package.An XML file that is a custom DLP policy template can contain the XML that is your classification rule package. For an overview of defining your own DLP templates as XML files, see Defining Your Own DLP Templates and Information Types.DLP templates as XML files, see Define your own DLP templates and information types.

Overview of the rule creation processOverview of the rule authoring process

The rule building procedure consists of the following general steps.The rule authoring process is made up of the following general steps.

  1. Prepare a set of test documents that are representative of each destination environment. Important features for test documents to consider: A subset of documents contains the entity or affinity for which the rule is created, and a subset of documents does not contain the entity or affinity for which the rule is created.Prepare a set of test documents representative of their target environment. Key characteristics to consider for the set of test documents include: A subset of the documents containing the identity or affinity for which the rule is being authored, and a subset of the document ,

  2. Identify the rules that meet acceptance requirements (accuracy and recall) to identify qualifying content. This may require the creation of multiple conditions within a rule that are combined with Boolean logic and together meet the minimum match criteria for identifying target documents.Identify the rules that meet acceptance requirements. This may require the development of multiple conditions within a rule, bound with Boolean logic, which together satisfy the minimum match requirements to identify target documents.

  3. Set the recommended level of confidence for the rules based on the acceptance requirements (accuracy and recall). The recommended reliability element can be considered as the default level of confidence for the rule.Establish the confidence level for the rules based on the acceptance requirements (precision and recall). The precautionary element can be thought of as the default confidence level for the rule.

  4. Review the rules by issuing a policy and monitoring the content of the test. Adjust the rules or confidence levels based on the results to maximize content while minimizing false positives and negative results. Continue to review and adjust the rules until a satisfactory level of content detection is achieved for both positive and negative examples.Validate the rules by instantiating a policy with them and monitoring the sample test content. Minimized false positives and negatives. Continue the cycle of validation and rule adjustment until a positive level of content detection is reached for both positive and negative samples.

For information about the XML schema definition for policy template files, see Developing Template Files for DLP Policies.For information about the XML schema definition for policy template files, see Developing DLP policy template files.

rule DescriptionRule description

There are two main types of rules that can be created for the DLP module for detecting sensitive information: entity and affinity. The selected rule type is based on the type of processing logic that should be used in processing the content as described in the previous sections. The rule definitions are configured in an XML document in the format specified by the standard rule XSD. The rules describe both the content type to be compared and the level of confidence that represents the described match in the destination content. The reliability level specifies the likelihood that the entity will exist when a pattern is found in the content, or the likelihood that the affinity will be present when evidence is found in the content.Two main rule types can be authored for the DLP sensitive information detection engine: Entity and Affinity. The Rule type is / are used on the type of processing logic that should be applied to the processing of the content as described in the previous sections. The rule definitions are configured in an XML document in the format described by the standardized Rules XSD. The rules describe both the type of content to match and the level of confidence that the described match represents the target content. Evidence is found in the content. Confidence level specifies the probability of being found in the content.

Basic rule structureBasic rule structure

The rule definition consists of three main components:The Rule definition is constructed from three main components:

  1. entity defines the match and count logic for this ruleentity defines the matching and counting logic for that rule

  2. affinity defines the match logic for the ruleAffinity defines the matching logic for the rule

  3. Localized strings Localization of rule names and their descriptionsLocalized strings localization for rule names and their descriptions

Three additional supporting elements are used to define the details of the processing referenced in the main components: keyword, RegEx, and function. References can use a single definition of supporting elements, such as a social security number, in multiple entity or affinity rules. The basic rule structure in XML format can be described as follows.The details of the processing and referenced within the main components: Keyword, Regex, and Function. By using references, a single definition of supporting elements, like a social security number, can be used to multiple entity or affinity rules. The basic rule structure in XML format can be seen as follows.



  
    
    
    
DLP by EPG CSO Custom Rule Pack This is a Rule package for an EPG demo.
( S) ( d {9}) ( s) Identification Contoso Employee Employee ID A custom classification for detecting Employee ID's

EntitätsregelnEntity rules

Entity rules refer to properly defined identifiers, such as the social security number, and are represented by a collection of countable patterns. Entity rules return the number and confidence level of a match, where the number equals the total occurrences of the entity that was found, and the confidence level for the probability that the specified entity appears in the given document. Entities contain the id attribute as a unique identifier. The identifier is used for localization, versioning, and queries. The Entity ID must be a GUID that should not be duplicated in other entities or affinities. This is referred to in the section "Localized Strings".Entity Rules are targeted towards well-defined identifiers, such as Social Security Numbers, and are represented by a collection of countable patterns. Entity Rules returns to the confidence level of a match, where Count is the total number of instances of the entity that were found and the confidence level. Entity contains the "id" attribute as its unique identifier. The identifier is used for localization, versioning, and querying. The entity must be a GUID and should not be duplicated in other entities or affinities. It is referenced in the localized strings section.

Entity rules include the optional patternProximity attribute (default = 300), which is used when applying Boolean logic to specify the approximation of multiple patterns that are required for the match condition to be met. The entity element contains one or more child pattern elements, each pattern being a unique representation of the entity, such as "credit card" or "driver's license." The pattern element has the obligatory confidenceLevel attribute, which represents the accuracy of the pattern based on the sample data set. The pattern element can have three children:Entity rules contains optional patternsProximity attribute (default = 300) which is used when applying Boolean logic to specify the adjacency of multiple patterns required to satisfy the match condition. Entity element contains 1 or more child Pattern elements, where each pattern is a distinct representation of the entity like Credit Card Entity or Driver's License Entity. The Pattern element has a required attribute of confidence Level which represents the pattern's precision based on sample dataset. Pattern element can have three child elements:

  1. IdMatch – This value is required.IdMatch – This is required.

  2. matchmatch

  3. AnyAny

If any pattern element returns true, the pattern is met. The number of entity elements is the sum of all recognized pattern elements.If any of the pattern elements returns "true," the pattern is satisfied. The count for the entity element equals the sum of all detected pattern counts.

Mathematical formula for entity number

where "k" is the number of pattern elements for the entity.where k is the number of pattern elements for the entity.

A pattern element must have exactly one IdMatch element. "IdMatch" stands for the identifier to be compared with the pattern, e.g. A credit card number or an ITIN number. The number of a Pattern element is equal to the number of IdMatches compared to the Pattern element. The IdMatch anchors the proximity window for the match elements.A pattern element must have exactly one IdMatch element. IdMatch represents the identifier that is the pattern for a credit card number or ITIN number. The Count for a pattern is the number of IdMatches matched with the Pattern element. IdMatch element anchors the proximity window for the match elements.

Another optional sub-element of the pattern element is the match element, which is confirmation evidence that needs to be matched to aid in the search for the IdMatch element.Another optional sub-element of the pattern element is the match element which represents the identification element. For example, the higher reliability rule may require that additional artifacts be present in the document, in addition to the search for a credit card number, within an approximate credit card window, such as address and name.For example, the higher confidence rule may require, in addition to finding a credit card number, additional artifacts exist in the document, within a proximity window of the credit card, like address and name. These additional artifacts would be represented by the match element or any element (these are described in detail in the Matching Methods and Techniques section).This additional artifacts would be represented by the Match element or Any element (these are detailed in Matching Methods and Techniques section). Multiple match elements can be included in a pattern definition that can be inserted directly into the pattern element or combined with the any element to define the matching semantics.Multiple match elements may be included in a pattern definition which may be included directly in the pattern element or combined using the any element to define matching semantics. It returns true if a match is found in the proximity window anchored around the IdMatch content.It returns true if a match is found in the proximity window anchored around the IdMatch content.

The details of which contents need to be compared are not defined in the IdMatch or Match elements, but are referenced by the idRef attribute. This promotes the reusability of definitions in multiple pattern constructs.Both the IdMatch and Match elements do not define the details of what the content needs to be. This promotes reusability of definitions in multiple pattern constructs.


    
        
        
            
            
            
            
        
    
    
        
        
        
            
            
            
        
    

Entity ID element represented by ".." in the previous XML code. should be a GUID referenced in the Localized Strings section.The Entity id element, represented in the previous XML by "…" should be a GUID and it is referenced in the Localized Strings section.

Approximation window for the entity patternEntity pattern proximity window

"Entity" contains the optional attribute value "patternsProximity" (integer, default = 300), which is used to search for patterns. The attribute value for each pattern defines the distance (in Unicode characters) from the IdMatch position for all other matches specified for this pattern. The proximity window is anchored by the IdMatch position, with the window to the left and right of "IdMatch" expanding.Entity holds optional patternsProximity attribute value (integer, default = 300) used to find the Patterns. For each pattern the attribute value defines the distance (in Unicode characters) from the IdMatch location for all other matches specified for that pattern. The proximity window is anchored by the IdMatch location, with the window extending to the left and right of the IdMatch.

Text pattern with highlighted matching elements

The example below shows how the proximity window affects the override algorithm, where the SSN IdMatch element requires at least an affirmative match of address, name, or date. Due to SSN2 and SSN3, there are only matches for SSN1 and SSN4, and no or only partially confirming evidence is found in the proximity window.Where the SSN IdMatch element requires 1 of address, name or date corroborating matches. Only SSN1 and SSN4 match because for SSN2 and SSN3, either no or only partial corroborating evidence is found within the proximity window.

Examples of matches and disagreements with rules

Note that the message text and each attachment are treated as independent elements.Note that the message body and each attachment are treated as independent items. This means that the proximity window does not extend beyond the end of each of these elements.This means that the proximity window does not extend beyond the end of each item. For each element (attachment or text), both the idMatch and the proof of verification must be within each element.For each item, both the idMatch and corroborative evidence needs to reside within each.

Reliability Level for EntityEntity confidence level

The confidence level of the entity element is the combination of all acknowledged confidence levels of "pattern". The confidence levels are combined using the following equation:Entity element's confidence level is the combination of all the patterns of confidence. They are combined using the following equation:

Mathematical formula for the entity reliability level

"k" is the number of pattern elements for the entity and a pattern that does not compare returns with a confidence level of 0.where k is the number of pattern elements for the entity and a pattern that does not match a confidence level of 0.

Regarding the code example for the entity element structure, the entity's confidence level is 94.75% when comparing both patterns, based on the following calculation:Referring back to the following example: Element sample structure, 94.75% based on the following:

CLentity= 1 – ((1-CLpattern1) x (1-CLpattern1))CLentity = 1 – ((1-CL Pattern1) x (1-CLPattern1))

= 1 – ((1-0,85) x (1-0,65))= 1 – ((1-0.85) x (1-0.65))

= 1- (0.15 x 0.35)= 1- (0.15 x 0.35)

= 94.75%= 94.75%

Likewise, if only the second pattern matches, the confidence level of the entity is 65%, based on the following calculation:Similarly, if the second pattern matches, the Entity's confidence level is 65% based on the following calculation:

CLentity= 1 – ((1 – CLpattern1) X (1 – CLpattern1))CLentity = 1 – ((1 – CL Pattern1) X (1 – CLPattern1))

= 1 – ((1 – 0) X (1 – 0.65))= 1 – ((1 – 0) X (1 – 0.65))

= 1 – (1 X 0.35)= 1 – (1 X 0.35)

= 65%= 65%

These confidence levels are assigned in the rules for individual patterns, based on the collection of test documents evaluated as part of the rule generation process.These confidence values ​​are validated as part of the rules for individual patterns.

affinity rulesAffinity rules

Affinity rule refers to content without properly defined identifiers, such as B. Sarbanes-Oxley or financial documents of the company. No independent, consistent identifier can be found in this content; instead, it must be analyzed whether a collection of evidence exists. Affinity rules do not return a count; instead, they return the existing credentials and associated confidence level. Affinity content is presented as a collection of independent evidence. Proof is an accumulation of required hits within a certain approximation. For an affinity rule, the approximation is defined using the evidencesProximity attribute (default 600) and the minimum confidence level is defined using the thresholdConfidenceLevel attribute.Sarbanes-Oxley or corporate financial content. Affinity rules are targeted towards content without well-defined identifiers. This is a collection of evidence is present. Affinity rules do not return a count, instead they return if found and the associated confidence level. Affinity content is presented as a collection of independent evidences. Evidence is an aggregation of required matches within certain proximity. For Affinity rule, the proximity is defined by the evidencesProximity attribute (default is 600) and the minimum confidence level by the thresholdConfidenceLevel attribute.

Affinity rules contain the id attribute for the unique identifier used for localization, versioning, and queries. Unlike entity rules, affinity rules do not include the IdMatch element because they do not rely on properly defined identifiers.Affinity rules contains the id attribute for its unique identifier that is used for localization, versioning and querying. Unlike Entity rules, because Affinity rules do not rely on well-defined identifiers, they do not contain the IdMatch element.

Each affinity rule contains at least one child Evidence element defining the evidence to be found and the confidence level designated in the affinity rule. Affinity is not proven if the resulting confidence level is below the threshold. Each evidence logically represents confirmatory evidence for this "type" document, and the confidenceLevel attribute represents the accuracy of this evidence in the test data set.Each Affinity Rule contains one or more children Evidence elements which define the evidence that is to be found in the Affinity Rule. The affinity is considered as the threshold level. Each Evidence Logically Represents Corroborative Evidence for this "type" of document and the confidence.

Evidence elements have one or more child Match or Any elements. If there is a match with all the child match and any elements, then the evidence is considered to be provided and the confidence level is included in the rule's confidence level calculation. The same description applies to match or any elements in affinity rules as to entity rules.Evidence elements have one or more matches or any child elements. If all match and any elements match, the evidence is found and the confidence level is. The same description applies to the Match or Any elements for Affinity rules as for Entity rules.


    
        
            
            
            
        
    
    
        
            
            
            
            
            
        
    

Approximation window for affinityAffinity proximity window

The proximity window for the affinity is calculated differently than for entity patterns. In affinity, the approximation follows a sliding window model. The affinity approximation algorithm attempts to find the maximum number of matching credentials in a given window. The proofs in the proximity window must have a confidence level that is greater than the threshold of the affinity rule being used.The proximity window for affinity is different than for entity patterns. Affinity proximity follows a sliding window model. The affinity proximity algorithm attempts to find the maximum number of matching evidences in the given window. The implications in the proximity window must be greater than the threshold defined for the affinity rule to be found.

Text near an affinity rule match

Reliability level for affinityAffinity confidence level

The affinity confidence level corresponds to the combination of the evidence found in the proximity window of the affinity rule. While this is similar to the entity level confidence level, the main difference lies in the use of the approximation window. As with entity rules, the confidence level of the affinity element is a combination of all satisfied confidence levels of the Evidence element, but the Affinity rule only represents the highest combination of Evidence elements found in the approximation window. The confidence levels of the Evidence element are combined with the following equation:Confidence level for the Affinity equals the combination of found Evidences within the proximity window for the affinity rule. Entrepreneurship, the key difference is the application of proximity window. Evidence confidence levels, but for Affinity rule it only represents the highest combination of evidence elements found within the proximity window. The Evidence confidence levels are combined using the following equation:

Mathematical formula for affinity rule reliability

"k" is the number of evidence elements for the affinity for which a match has been found in the approximation window.where k is the number of evidence elements for the affinity matched within the proximity window.

Looking back at Figure 4, "Example Affinity Rule Structure", if there is a match for all three proofs in the sliding approximation window, the reliability level of the affinity based on the following calculation is 85.6%. This exceeds the affinity rule threshold of 65%, which causes the rule to match.Referring back to Figure 4 Affinity rule structure, the affinity confidence level is 85.6% based on the calculation below. This results in the rule of law.

CLaffinity= 1 – ((1 – CLProof 1) X (1 – CLProof 2) X (1 – CLProof 2))CLAffinity = 1 – ((1 – CL Evidence 1) X (1 – CLEvidence 2) X (1 – CLEvidence 2))

= 1 – ((1 – 0.6) X (1 – 0.4) X (1 – 0.4))= 1 – ((1 – 0.6) X (1 – 0.4) X (1 – 0.4))

= 1 – (0.4 X 0.6 X 0.6)= 1 – (0.4 X 0.6 X 0.6)

= 85.6%= 85.6%

Example of affinity rule match with high reliability

From the same example rule definition, if only the first proof matches, since the second proof is outside the approximate window, the highest confidence level for the affinity based on the following calculation is 60% and the affinity rule does not match because the threshold of 65% was not reached.Using the same example rule definition, the first evidence matches because the second evidence is out of the window, the highest affinity confidence level is 60% not met.

CLaffinity= 1 – ((1 – CLProof 1) X (1 – CLProof 2) X (1 – CLProof 2))CLAffinity = 1 – ((1 – CL Evidence 1) X (1 – CLEvidence 2) X (1 – CLEvidence 2))

= 1 – ((1 – 0.6) X (1 – 0) X (1 – 0))= 1 – ((1 – 0.6) X (1 – 0) X (1 – 0))

= 1 – (0.4 X 1 X 1)= 1 – (0.4 X 1 X 1)

= 60%= 60%

An example of a low reliability affinity rule match

Optimizing reliability levelsTuning confidence levels

One of the most important aspects of the rule building process is optimizing the levels of confidence for entity and affinity rules. After creating the rule definitions, you should apply the rule to representative content and verify the data for accuracy. Compare the returned results for each sample or proof with the expected results for the test documents.One of the key aspects of the authoring process is the Entity and Affinity Rules. After creating the rule, the rule against the representative content and collect the accuracy. Compare the returned results for each pattern or evidence against the expected results for the test documents.

Table for comparing rule matches

If the rules meet the acceptance requirements, that is, the pattern or proof has a confidence rate above a specified threshold (for example 75%), the match expression is complete and can be moved to the next step.If the rules meet acceptance requirements, that is, the Pattern or Evidence has a confidence rate above an established threshold (e.g. 75%), the match expression is complete and it can be moved to the next step.

Wenn Muster oder Nachweis nicht der Zuverlässigkeitsstufe entsprechen, überarbeiten Sie beides (fügen Sie z. B. weitere bestätigende Nachweise hinzu, oder entfernen Sie Muster/Nachweise bzw. fügen Sie weitere hinzu usw.), und wiederholen Sie diesen Schritt.If the Pattern or Evidence do not meet the confidence level, then re-author it (e.g. add more corroborative evidence; remove or add additional Patterns/Evidences; etc.) and repeat this step.

Optimieren Sie im nächsten Schritt die Zuverlässigkeitsstufe für jedes Muster oder jeden Nachweis in Ihren Regeln basierend auf den Ergebnissen des vorherigen Schritts. Aggregieren Sie für jedes Muster oder jeden Nachweis folgende Werte: die Anzahl von richtig positiven Ergebnissen (True Positives, TP), die Teilmenge der Dokumente, die die Entität oder Affinität enthalten, für die die Regel erstellt wurde und die zu Übereinstimmung führten, die Anzahl von falsch positiven Ergebnissen (False Positives, FP) und die Teilmenge der Dokumente, die die Entität oder Affinität nicht enthalten, für die die Regel erstellt wurde und die ebenfalls eine Übereinstimmung zurückgegeben haben. Legen Sie die Zuverlässigkeitsstufe für jedes Muster/jeden Nachweis mithilfe der folgenden Berechnung fest:Next, tune the confidence level for each Pattern or Evidence in your rules based on the results from the previous step. For each Pattern or Evidence, aggregate the number of True Positives (TP), subset of the documents that contain the entity or affinity for which the rule is being authored and that resulted in a match and the number of False Positives (FP), a subset of documents that do not contain the entity or affinity for which the rule is being authored and that also returned a match. Set confidence level for each Pattern/Evidence using the following calculation:

Zuverlässigkeitsstufe = richtig positive Ergebnisse / (richtig positive Ergebnisse + falsch positive Ergebnisse)Confidence Level = True Positives / (True Positives + False Positives)

Muster oder NachweisePattern or EvidenceRichtig positive ErgebnisseTrue PositivesFalsch positive ErgebnisseFalse PositivesZuverlässigkeitsstufeConfidence Level
P1oder E1P1or E1441180%80%
P2oder E2P2or E2222250%50%
Pnoder EnPnor En9910 1047%47%

Verwenden von lokalen Sprachen in einer XML-DateiUsing local languages in your XML file

Das Regelschema unterstützt die Speicherung von lokalisierten Namen und Beschreibungen für alle Entity- und Affinity-Elemente. Für jedes Entity- und Affinity-Element muss es ein entsprechendes Element im Abschnitt "LocalizedStrings" geben. Zum Lokalisieren der einzelnen Elemente schließen Sie ein Resource-Element als untergeordnetes Element des LocalizedStrings-Elements ein, um den Namen und die Beschreibungen für mehrere Gebietsschemas für jedes Element zu speichern. Das Resource-Element enthält ein erforderliches idRef-Attribut, das dem zugehörigen idRef-Attribut jedes lokalisierten Elements entspricht. Die untergeordneten Elemente "Locale" des Elements "Resource" enthalten den lokalisierten Namen und die Beschreibungen für jedes angegebene Gebietsschema.The rule schema supports storing of localized name and description for each of Entity and Affinity elements. Each Entity and Affinity element must contain a corresponding element in the LocalizedStrings section. To localize each element, include a Resource element as a child of the LocalizedStrings element to store name and descriptions for multiple locales for each element. The Resource element includes a required idRef attribute which matches the corresponding idRef attribute for each element that is being localized. The Locale child elements of the Resource element contains the localized name and descriptions for each specified locale.


    
        
            affinity name en-us
            
                
                affinity description en-us
            
        
        
            affinity name de
            
                
                affinity description de
            
        
    

XML-Schema-Definition des KlassifizierungsregelpaketsClassification rule pack XML schema definition



  
    
      
        
          
        
      
    
  
  
    
      
    
  
  
    
      
      
        
          
          
        
        
          
          
        
        
          
          
        
        
          
          
        
        
          
          
        
      
    
  
  
    
      
      
      
        
          
          
        
        
          
          
        
      
      
    
    
  
  
    
    
    
    
  
  
    
  
  
    
      
      
      
    
    
  
  
    
      
    
    
  
  
    
      
      
    
  
  
    
      
      
    
  
  
    
      
      
    
  
  
    
      
      
    
  
  
    
      
      
    
  
  
    
      
        
        
      
      
        
        
      
      
    
  
  
    
      
    
    
    
    
    
  
  
    
      
      
        
        
      
    
    
  
  
    
      
    
    
    
    
    
  
  
    
      
        
        
      
    
    
  
  
    
  
  
    
  
  
    
      
        
        
      
    
    
    
  
  
    
      
    
  
  
    
      
      
    
  
  
    
      
      
    
  
  
    
      
        
      
    
  
  
    
      
    
    
  
  
    
      
        
      
    
    
      
        
          
          
        
      
    
  
  
    
      
        
      
    
  
  
    
      
        
          
          
        
        
          
          
        
      
    
  
  
    
      
      
    
    
  
  
    
      
        
        
      
    
  
  
    
      
        
        
      
    
  

Weitere InformationenFor more information

Verhinderung von DatenverlustData loss prevention

Definition eigener DLP-Vorlagen und InformationstypenDefine your own DLP templates and information types

<!– –>

[https://www.binance.com/?ref=16820269]

Join the Binance global trading platform

Leave a Reply

Your email address will not be published. Required fields are marked *