Search
Saturday 6 July 2024
  • :
  • :

Getting Off The “Happy Path”: A Cool Approach To Synthetic Data.

path

As developers, most of us first code our solution to data that we know will work.  Some people call this the “happy path” – so I will never use that term again. The better developers will write their test cases first to fail, then provide their code and assert their test which will pass. I am not one of a these better developers, so I have always been looking for a way to generate synthetic data with good spread (data with a lot of variance) to make up for my lack of behavior-driven development.

Take phone numbers, for example. Anyone can create data with a random 7 digits or 10 digits and insert it into a lead, but when I mean spread I am talking about phone numbers that sometimes have dashes and sometimes have parentheses maybe even some that have extensions or international formats. Hell, a phone number may even be “Klondike 5”.

Most of us know we should have this type of spread when we create synthetic data for salesforce, but frankly that is too much work. There are tools out there like generatedata.com that will create synthetic data, and even some our or better ETL tools like Talend do a pretty good job at that, but I am too lazy to create .csv files, then use data loader to insert them into my Salesforce org. I want to generate 2,000 phoney leads, with good spread, and insert into my dev org by only putting in my credentials and specifying the count. To that end, let me Introduce nforce with  FakerJS.

Those of you who know me understand my love affair with node.js,  (more about how node will be the last utility language you will ever need in my next blog post) will appreciate how I came upon this solution and even contributed it to the examples include in nforce. But let’s start with a little background on nforce and FakerJS.   Nforce is a node.js package written byKevin O’hara that simply wraps the Salesforce API in the totally awesome (yes I work withJeff Douglas) language that is node. I don’t know how I stumbled on Faker.js, but it is a package that creates synthetic data written in Node. So yeah I think these two are like peanut butter and chocolate, Batman and Robin or maybe even nn and pine.

FakerJS solved my spread problem using the brute force method, which is fine as long as you are not the brute. For example the Faker.js lib has an array called Faker.definitions.first_name which contains 3000 first names. Likewise, it has an array for Last names, US states, UK counties, catch phrases, company names, buzzwords, and many other elements which include my favorite: bs_nouns like “synergies”, “web-readiness”. In addition to these arrays it has several dozen methods like Faker.random.street_suffix which will return something like Street, Drive, or Avenue. I modified this library to create my own types like Faker.definitions.tickers which contain all the NYSE tickers and Faker.definitions.tv_channels which return a call sign for one of the call signs for a comcast channel in Downers Grove, Illinois where I reside.

In addition, I made a helper functions called Faker.Helpers.sfLead that maps these common data types to a Salesforce Lead. This way I can call this helper function from my node script (included in the example directory of nforce called Faker2lead.js) to create a JSON object that will eventually get inserted as a lead using nforce.

I made what I consider a few additional improvements to the helper function that should be mentioned. The original Faker.JS lib has a firstName, LastName, email and username method. The email and username method both call the firstname and lastname method (amongst others) to create legitimate looking emails  and username. For example, email takes a part of a random firstname and a part of a lastname (with some variance) and adds a domain, so you might expect something like [email protected] However, it does not re-use the already created random firstname and lastname so you might get something like this:

 { Firstname: Julie,

   Lastname: Brown,

   email: [email protected] }

I did not like this for obvious reasons, so I reused the already set  firstname and lastname so my email would resemble the first and last name. So in the above examples I might expect to see [email protected] instead of [email protected]

The potential for using these tools for synthetic data is vast. The Faker.js lib includes the sfLead helper, but it was just intended to be an example. You could create your own custom helper as well. You might even modify it to create accounts instead of leads, which should be pretty easy to do. But if you wanted then to attach contacts to these accounts, you would have a little more work to do. You might have nforce randomly get an account id to insert your new synthetic contact against, or could take the lazy route which I would probably do. This would entail inserting all your accounts first then fetch all the ids. You could then add these account ids as an new array called Faker.definitions.myAccountIds and randomly select one following the same patterns you have already seen.

If you were really ambitious, and we have already determined I am lazy, you would include the Faker.js lib in as a static resource and create a VF page and utility class that would create your synthetic records right in Salesforce and bypass nforce all together. That would be a really cool appExchange package, if you ask me. You could also create a little VF page where you could map object.fields to the methods in the Faker.js static resource and presto.

If you are a Salesforce developer and would like to play with the latest in cool languages, I would urge you to pick up node.js. Since you are mostly likely already using Javascript for your VF development, node.js uses javascript as its syntax and should be pretty easy to learn. It also has the best package manager I have seen, called npm. Even if you are weak at Javascript, Kevin O’hara’s documentation is really straightforward and you should be able to query some records in just an hour or two if you are mildly proficient at the command line.

Keep calm and SOQL on!




Leave a Reply