Obfusware is specifically designed to meet your Big Data data masking requirements. Big Data has become an important part of most businesses as the benefits of analyzing big data sets has become a necessity to be competitive in today's rapidly evolving markets. With the emergence of AI and Machine Learning and the huge datasets required for training models, managing data privacy for customers and regulatory compliance risk is a new challenge that requires a solution designed for the novel processes and system being developed to keep your business competitive.
Existing data masking systems are designed for handling data in traditional relational database systems such as PostgreSQL, MySQL, and Oracle. When pressed into service on big data they face challenges.
Amazon Web Services (AWS) are the number one cloud service for hosting of big data. The use of AWS S3 for data storage and AWS Glue provide a powerful solution for big data management. AWS Glue is a serverless data integration service that provides data extraction, transformation and loading using both custom scripting and visual data pipeline development.
Obfusware has been designed and built to tightly integrate with AWS Glue and provide data masking transforms that perform like core AWS Glue transforms.
Obfusware integrates with the Glue Data Catalog to enable configuring data masking in the Catalog so that Jobs can be created without having to know the data masking requirements and guarantees the correct masking will be used for the given table.
Obfusware leverages the AWS Glue Custom Visual Transform (CVT) interface to allow Obfusware enabled data pipelines to be developed using AWS Glue Studio, the graphical interface used to create, run, and monitor Glue jobs.
Apache Spark is the premier open source tool for processing big data. Big Data solutions such AWS Glue and Databricks have built their solutions around Apache Spark.
Apache Spark core concept is the dataframe which represents the data being processed. The dataframe allow advance data processing strategies such as Lazy Evaluation and its Advanced DAG Execution Engine to optimize complex chains of operations to maximize processing performance.
Obfusware interfaces directly with the dataframe API which ensures that Obfusware receives all the benefits of the advanced data processing capabilities of Spark which made it the most widely-used engine for scalable computing.
Apache Spark is used by thousands of companies, including 80% of the Fortune 500.
One of the challenges of AI is preventing the disclosure of private information when the AI generates solutions. Because AI is trained on large volumes of data and that data becomes encoded in the AI model, any data used to train the model can become part of answers provided by the AI when prompted, exposing private data.
The solution is mask private data before training. Ofusware can mask data and replace it with realistic data that maintains the data realism, relationships and referential integrity required to train AI models while preserving customer and business data privacy and providing regulatory compliance risk management.
Obfusware has been designed with advanced data masking features and capabilities to maximize big data processing integration and performance
Obfusware provides an extensive set of data masking or obfuscation functions based on proven algorithms
Maintaining data referential integrity is one of the key characteristics of useful data masking algorithms
Data does not usually exist in isolation. It most often is related to other data items or even very often duplicated. When masking data it is very important to maintain referential integrity. If a data value is masked in one data table or file, then it is important that it is masked in any other table or file to the same value. This means data masking methods need to be deterministic, always returning the same masked value for a given input value.
Big Data is often includes semi-structured and structured data
Data formats often used with Big Data such as Parquet, JSON, and XML are charactered as semi-structured because the do not conform to the strict requirements of structured data used by RDBMS. Semi-structured data is often self-describing, including tags to describe fields instead of conforming to a set schema. Semi-structured data is often nested and each data element may contain different fields.
Obfusware provides support for semi-structured nested data by allowing addressing of nested fields using JSON like field selectors (ie "object.field1.subfield"). This alleviates the need to transform the data using techniques such as flattening to efficiently apply data masking and the potential lose of data structure and information during transformations, while delivering high performance.
It is important that data masking not return the same results for every organization
One of the requirements of data masking is that it should be impossible to determine the original data value from the masked value. If data masking methods return the same value for for every organization, then it would be possible to a third party to use the software to determine what original values mask to a given result value. This would allow them to potentially determine the original value thus invalidating the privacy promised by data masking.
Obfusware creates a unique context for each organization using the data masking software. Using cryptographic techniques, the context is used to generate a unique mapping from the original data value to the masked data value. The result is deterministic so the masking for a given organizational context will not change overtime.
Obfusware data maskers offer many configuration options to produce varied results to meet requirements
Obfusware offers over a dozen data masking algorithms, each of which can be configured to create numerous data maskers to meet specific criteria. Out-of-the-box Obfusware offers almost two dozen pre-configured maskers to meet a the most common data masking requirements.
Extend Obfusware data masking with customer data maskers using the Obfusware masker API
While Obfusware built-in data masking algorithms cover the vast majority of data masking requirements using their highly configurable behavior, there are sometimes custom datamasking requirements which cannot be met using the built-in masking algorithms. For these special cases, Obfusware provides the ability to add custom data masking algorithm using the masker API, which makes creating a data masker with all the capabilities of a built-in masker as simple as writing Java class with a few methods implementing the desired behavior.
Obfusware provides extensive configurable statics gathering on masking operations
Obfusware gathers statistics on the time spent masking data including count, min, max, mean, variance, and standard deviation, throughput and error rates. Obfusware is able to aggregate statistics from distributed operations to provide an overview of masking performance.
High performance data masking to manage your compliance risk with your Big Data assets. Provides the Big Data tools support to seamlessly integrate with your enterprise Data Lake.
GET Obfusware