Getting Started Tutorial

Notes

Obfusware experience: Beginner
Requirements:
        A Windows, MacOS, or Linux computer with internet browser
        Installing Obfusware AG
Approximate time to complete: 20 minutes
Last updated: 3 Aug 2025

Now that Obfusware is installed to AWS, you are ready to create your first Obfusware AWS job and mask a dataset.  AWS Glue studio allows you to visually construct a Glue job in just a few steps.

  • Set the Job details
    Click on the Job details tab.

  •  Set the Basic property fields to the following values
Field Default Value
Name Untitled job First Obfusware job
Description <empty> <empty>
IAM Role <empty> ObfuswareGlueRole
Type Spark Spark
Glue version Glue 5.0 Glue 5.0
Language Python 3 Python 3
Worker type G 1X G 1X
Automatically scale the number of workers unchecked unchecked
Requested number of workers 10 2
Generate job insights checked checked
Generate lineage events unchecked unchecked
Job bookmark Disable Disable
Join run queuing unchecked unchecked
Flex execution unchecked unchecked
Number of retries 0 0
Job timeout (minutes 480 5
  • Save the job details
    After setting the appropriate job details, save the job by clicking the Save button in the top right corner of the page. 

  • Build the job
    Select the Visual tab

Add a source node by clicking the + icon.

Select the Amazon S3 source.

 And set the Amazon S3 source parameters:

  • S3 Source Type: S3 location
  • S3 Url: s3://obfusware-381492123456-us-east-1/3.0/resources/sample-data.csv
  • Data format: CSV
  • Delimiter: Comma (,)

Select the Obfusware Column Data Transform and set the transform parameters:

  • Masker 1: USLastNameMasker
  • Column 1: last_name
  • Masker 2: USVariableDateMasker
  • Column 2: dob
  • Masker 3: US555TelephoneMasker
  • Column 3: phone1
  • Masker 4: EmailMasker
  • Column 4: email

Select the Amazon S3 Target and set the Amazon S3 target parameters:

  • Format: CSV
  • Compression Type: None
  • S3 Target Location: s3://obfusware-381492123456-us-east-1/3.0/output/

Save the job by clicking the Save button on the top right of the page.

Enabling Obfusware AWS Glue jobs

Now that your First Obfusware Job has been created, there is one more step you need to complete before you can run the job.  Obfusware relies on a tight integration with AWS Glue.  In order to achieve this integration, Obfusware requires its code, in the form of jar files to and python files to be accessible by AWS Glue.

While it is possible to manually enable an AWS Glue job by setting some Job details advanced properties, it is not simple and a little error prone, so Obfusware provides a management tool to enable a job for you.

Obfusware-manager CLI tool

$ obfusware-manager enable-job --help
usage: ObfuswareAWSGlueCLI enable-job [-h] jobs [jobs ...]
positional arguments:
  jobs        Names of existing AWS Glue jobs which will be enabled to execute Obfusware transforms
optional arguments:
-h, --help show this help message and exit

This tool is installed on the computer and user account used to initially install Obfusware.  The tool is located in the bin install directory.

        On MacOS or Linux the bin install directory is located at:
        $HOME/.obfusware-aws/<version>/bin

        On Windows the bin install directory is located at:
        %USERPROFILE%\obfusware-aws\<version>\bin

To enable the First Obfusware Job, simply run the command:

     <bininstalldir>/obfusware-manager -v enable-job “First Obfusware Job”

You should see the following output from the obfusware-manager command:

        Enabling AWS Glue jobs to execute Obfusware transforms...
        Enabling AWS Glue job(First Obfusware Job)
        Enabling AWS Glue jobs SUCCEEDED

        Success

The Obfusware First Job is now ready to run.

Running the Obfusware job

To run the job, select the Runs tab. Then click the Run button on the top right of the page.

When the job finishes running, in approximately 1:30-1:45 minutes, the run status will change to Succeeded.

To see the result of the run, you can compare the original file (sample-data.csv) with the results of the run stored in the s3 output/ folder.

Comparing the results

The location and name of the source file, sample-data.csv, is well known, but while the location of the target file is known, the exact name is generated by the job.  Therefore, to compare the results you first need to list the contents of the output/ folder to discover the name.  To generate a listing you can run the command:

    $ aws s3 ls s3://obfusware-381492125655-us-east-1/3.0/output/
    2025-08-01 10:24:46          0
    2025-08-01 10:53:48 47600073 run-1754060013607-part-r-00000

Look for the file with a timestamp that matches the end of the job run.  Once you have discovered the name of the result file you can compare the contents of the source file with the contents of the target file.

On MacOS or Linux:

$ aws s3 cp s3://obfusware-381492123456-us-east-1/3.0/resources/sample-data.csv - | head -2
first_name,last_name,dob,company_name,address,city,county,state,zip,phone1,phone2,email,web
James,Butt,4/1/1997,"Benton, John B Jr",6649 N Blue Gum St,New Orleans,Orleans,LA,70116,504-621-8927,504-845-1427,jbutt@gmail.com,http://www.bentonjohnbjr.com

$ aws s3 cp s3://obfusware-381492125655-us-east-1/3.0/output/run-1754060013607-part-r-00000 - | head -2
first_name,last_name,dob,company_name,address,city,county,state,zip,phone1,phone2,email,web
James,Ketchersid,5/1/1997,"Benton, John B Jr","6649 N Blue Gum St","New Orleans",Orleans,LA,70116,504-555-7562,504-845-1427,freddy6791@example.com,"http://www.bentonjohnbjr.com"

On Windows:

> aws s3 cp s3://obfusware-381492123456-us-east-1/3.0/resources/sample-data.csv - | more
first_name,last_name,dob,company_name,address,city,county,state,zip,phone1,phone2,email,web
James,Butt,4/1/1997,"Benton, John B Jr",6649 N Blue Gum St,New Orleans,Orleans,LA,70116,504-621-8927,504-845-1427,jbutt@gmail.com,http://www.bentonjohnbjr.com

> aws s3 cp s3://obfusware-381492125655-us-east-1/3.0/output/run-1754060013607-part-r-00000 - | more
first_name,last_name,dob,company_name,address,city,county,state,zip,phone1,phone2,email,web
James,Ketchersid,5/1/1997,"Benton, John B Jr","6649 N Blue Gum St","New Orleans",Orleans,LA,70116,504-555-7562,504-845-1427,freddy6791@example.com,http://www.bentonjohnbjr.com

By comparing the source fields to the corresponding target fields, you can see the results of masking the selected fields.

Postscript

Next:TBD