Skip to content

Security Functions

The following security related functions are available in Okera. These can also be found in Privacy and Security Functions.

Function Description
autotag() Apply the autotagging rules to the given string
diff_privacy() Enables differential privacy for numeric values
fnv_hash() Hashes the given value using the FNV algorithm
fp_noref_tokenize Replaces value with a format preserving token that does not maintain referential integrity.
fp_ref_tokenize Replaces value with a format preserving token that maintains referential integrity for all types.
fpt() Same as fp_noref_tokenize
fpt_ref() Same as fp_ref_tokenize
get_groups() Returns the groups of the given user/group
get_roles() Returns the roles of the given user/group
get_tags() Returns the tags associated with a schema registry object
has_access() Returns true if the user has access to a specified resource, else returns false
has_roles() Returns true if the user has the specified roles, else returns false
mask() Masks a value, crossing out alpha-numerical characters
mask_ccn() Masks a credit card number, crossing out the leading digits
nfp_ref_tokenize Replaces value with a non-format preserving token that maintains referential integrity.
nfp_noref_tokenize Replaces value with a non-format preserving token that does not maintain referential integrity.
random_ccn() Returns a random credit card number
tokenize() Replaces value with a format preserving token that maintains referential integrity for STRING & BIGINT types only.

Some of the functions and statements are explained below in more detail.

  • <numeric_type> diff_privacy(<numeric_type> <value>, INT <noise>)

    Allows you to add statistical noise (or jitter) to a numeric value, while retaining the same properties when aggregated. For example, it is possible to add or remove 10 years to the individual values of a column containing birthdays, but when calculating the average age across all records you still get close to the same result.

    Careful attention has been taken to ensure that this is not susceptible to repeated queries. For example a query such as SELECT salary WHERE id = 1234 always returns the same value, not a normally distributed random value around the true one. Otherwise, a malicious user could simply rerun queries and average the results. This is implemented by making sure the "same query" always returns the same noise,that is, you get no additional information from repeated runs.

    Example: Using the diff_privacy() function

    > SELECT diff_privacy(50, 10);
    54
    
  • STRING autotag(STRING)

    Allows you to apply auto-tagging rules to a given string.

        > SELECT autotag("125.1.10.34")
    pii.ipv4
    
  • BIGINT fnv_hash(<primitive_type>)

    Applies a FNV-based hash algorithm to the input value. This is similar to MD5 but non-cryptographic and faster. The resulting value is not reversible, but for a specific input the resulting value is always the same.

    Example: Using the fnv_hash() function

    > SELECT fnv_hash("125.1.10.34")
    -7425713518659373000
    
  • STRING fp_random(STRING)

    This functions in a similar manner as tokenize() and fpt_ref(), but with random values every time the function is invoked, regardless of whether the original string stays the same or not.

    Example: Using the fp_random() function

    > SELECT fp_random("125.1.10.34")
    077.2.58.52
    > SELECT fp_random("125.1.10.34")
    568.8.64.21
    
  • STRING get_groups(STRING)

    Returns the groups in which a given user is a member.

        > SELECT get_groups("analyst")
    analyst,mktg_analyst
    
  • STRING get_roles(STRING)

    Returns the roles to which a given user is assigned.

        > SELECT get_roles("analyst")
    mktg_analyst_role,okera_public_role,okera_workspace_role
    
  • STRING get_tags(STRING)

    Returns the tags with which a given resource (such as a database or dataset) is associated.

    > SELECT get_tags("customer.account_address_created");
    dog.labradoodle,feline.lion
    
  • BOOLEAN has_access(STRING) and BOOLEAN has_roles(STRING)

    See Authorization Built-ins for more details.

    The has_access() function allows you to perform forming conditional checks. It returns true if the current user has access to the given catalog object. Otherwise, it returns false.

    Example: Using the has_access() function

    > SELECT has_access('okera_sample.users');
    true
    

    The has_roles() function returns true when the current user is granted all of the listed roles, specified as a comma-separated list.

    Example: Using the has_roles() function

    > SELECT has_roles('dev_role');
    false -- dev_role is _not_ granted to the current user
    
    > SELECT has_roles('sales_role');
    true -- sales_role is granted to the current user
    
    > SELECT has_roles('sales,role,dev_role');
    false -- since dev_role is _not_ granted to the current user
    
  • mask(<STRING | INT | DATE>, INT <start>, INT <length>)

    Type Datatypes Masking behavior
    String STRING, TEXT, VARCHAR Format-preserving, replaces alphanumeric characters with X. You may specify a character limit starting at position <start> (1-based) and ending after <length> characters.
    Numerical BIGINT, DECIMAL, INT, FLOAT Uses a zero value
    Date & Time DATE, TIMESTAMP Uses epoch time zero. 01.01.1970 00:00:00.000000

    Example: Using the mask() function

    > SELECT mask("5000 4000 3000 1000", 6, 4);
    5000 XXXX 3000 1000
    
    > SELECT mask("5000 4000 3000 1000", 6, 9);
    5000 XXXXXXXXX 1000
    
  • STRING mask_ccn(STRING)

    Masks all but that last four digits of a given credit card number.

    Example: Using the mask_ccn() function

    > SELECT mask_ccn("5000 4000 3000 1000")
    XXXXXXXXXXXXXXX1000
    
    > SELECT mask_ccn("5000400030001000")
    XXXXXXXXXXXX1000
    
    > SELECT mask_ccn("50004000300010001")
    XXXXXXXXXXXXX0001
    
  • STRING tokenize(STRING) and STRING fpt_ref(STRING)

    Applies a format-preserving transformation to the given text. This replaces all alphanumeric characters with random ones. All non-alphanumeric characters are left as-is. The random generator used to determine the new characters is seeded with the original value, meaning the resulting tokenized value is always the same for a specific original value.

    Example: Using the tokenize() and fpt_ref() function

    > SELECT tokenize("125.1.10.34")
    847.9.84.92
    
    > SELECT fpt_ref("125.1.10.34")
    847.9.84.92
    
  • STRING random_ccn()

    Returns a random credit card number, separated with dashes.

    Example: Using the random_ccn() function

    > SELECT random_ccn();
    5580-5293-0763-8227
    
    > SELECT random_ccn();
    3484-8051-1437-2629