Cover

Generating Sample Data with Bogus

January 29, 2023
No Comments.

Most of my job these days is creating tutorials or examples. This means I often start from scratch with a new type of project. I usually just cruft up some sample data for my project to start. While investigating some alternatives to Automapper (video and blog coming soon), I wanted to be able to create a bunch of sample data. I, luckily, ran into a tool that I was surprised I’d never heard of: Bogus. Let’s talk about it.

I also made a Coding Short video that covers this same topic, if you’d rather watch than read:

What is Bogus?

Bogus is a library that works with C#, F# and VB.NET that can be used to create repeatable, fake data for applications. It is somewhat a port of a similar library Bogus.js. It accompished this by creating generators (called Fakers) that have a set of rules for generating one or more fake objects. Built-into Bogus is a set of generalized rules for common data categories (i.e. Addresses, Companies, People, Phone Numbers, etc.). Enough talk, let’s see how it works. The full repo is at:

Bogus

To install Bogus, you can use the Package Manager or just the dotnet CLI:

> dotnet add package Bogus

Creating a Faker

You start out by creating an instance of a class called Faker<T>. From that class you would use a fluent syntax to set up rules on creating sample data. But let’s start with our POCO for a Customer:

public class Customer
{
  public int Id { get; set; }
  public string? CompanyName { get; set; }
  public string? Phone { get; set; }
  public string? ContactName { get; set; }
  public int AddressId { get; set; }
  public Address? Address { get;set;}
  public IEnumerable<Order>? Orders {get;set;}
}

You can notice that aside from simple properties, we have a one-to-one relationship to an Address and a one-to-many relationship with Orders. Let’s start by creating a faker for the Customer object and the simple properties:

var customerFaker = new Faker<Customer>();

We can then use the RuleFor method to specify a rule for the Company Name:

var customerFaker = new Faker<Customer>()
  .RuleFor(c => c.CompanyName, f => f.Company.CompanyName())

The first parameter of the RuleFor method is a lambda to pick the property on Customer that I want to fake. The second parameter is another lambda to pass in how to generate the property. While we could write any code we need here, the most-common case is to use the Faker object passed to use the built-in semantics. In this case we are using the Company category to generate a company name.

If we continue this, we can fake more simple properties like so:

var customerFaker = new Faker<Customer>()
  .RuleFor(c => c.CompanyName, f => f.Company.CompanyName())
  .RuleFor(c => c.ContactName, f => f.Name.FullName())
  .RuleFor(c => c.Phone, f => f.Phone.PhoneNumberFormat());

You can see here that we’re using the Name category and the Phone category. The Bogus library has a large set of these built-in semantics. Sometimes we’ll need to use custom code to generate data we need. For example, we’ll want to generate IDs for the generated customers. One strategy is to just create a local integer and assign it with simple code:

var id = 1;

var customerFaker = new Faker<Customer>()
  .RuleFor(c => c.Id, _ => id++)
  .RuleFor(c => c.CompanyName, f => f.Company.CompanyName())
  .RuleFor(c => c.ContactName, f => f.Name.FullName())
  .RuleFor(c => c.Phone, f => f.Phone.PhoneNumberFormat());

Here we can see that we just have an integer (which will become a closure to the rule) and we just increment it everytime a new customer is created.

To use the Faker, we can just call Generate() with how many you want:

var customers = customerFaker.Generate(1000);

This will create a thousand fake customers.

Repeatable Fake Data

By default, the generation of customers is random. So that everytime you create an instance of the Faker object (e.g. new Faker<Customer>), you would get different customers. When you want a consistent set of fake data, you can use a seeder to ensure that you get the same data every time. To do this, you just need to set a seed value to the same number:

public class CustomerFaker : Faker<Customer>
{
	public CustomerFaker()
  {
    var id = 1;

    UseSeed(1969) // Use any number
      .RuleFor(c => c.Id, _ => id++)
      .RuleFor(c => c.CompanyName, f => f.Company.CompanyName())
      .RuleFor(c => c.ContactName, f => f.Name.FullName())
      .RuleFor(c => c.Phone, f => f.Phone.PhoneNumberFormat());
  }
}

var customers = new CustomerFaker().Generate(1000);

When you do this, you can guarantee to get the same customers. But this affects the entire instance of the faker. This is because every call to Generate will generate the next set of faked data. For example:

var customerFaker = new CustomerFaker();

var customers = customerFaker.Generate(1);
var companyName = customers.First().CompanyName;

var newCustomers = customerFaker.Generate(1);

Assert.IsTrue(companyName == newCustomers.First().CompanyName); // FAILS

This is because the seed is the repeatable data is per-instance. So that the the first call to Generate will give you the first repeatable object; and the second call to Generate gives you the second object.

But if you create a new instance, the names will be guaranteed:

var customerFaker = new CustomerFaker();

var customers = customerFaker.Generate(1);
var companyName = customers.First().CompanyName;

var newFaker = new CustomerFaker();
var newCustomers = newFaker.Generate(1);

Assert.IsTrue(companyName == newCustomers.First().CompanyName); // TRUE

This support the idea of repeatable sample data!

In our Customer class, we have a property for an Address. We can create a Faker for the Address too:

public class AddressFaker : Faker<Address>
{
  public AddressFaker()
  {
    var id = 0;
    UseSeed(1969)
      .RuleFor(c => c.Id, f => ++id)
      .RuleFor(c => c.Address1, f => f.Address.StreetAddress())
      .RuleFor(c => c.Address2, f => f.Address.SecondaryAddress())
      .RuleFor(c => c.City, f => f.Address.City())
      .RuleFor(c => c.StateProvince, f => f.Address.State())
      .RuleFor(c => c.PostalCode, f => f.Address.ZipCode());
  }
}

Again, there is a category for the type of data we need and can decide how to generate sample addresses. One thing you might want is to optionally not create certain parts of the fake data. For example, for our addresses, I want some of the Address2 properties to be null to replicate some apartment/suite numbers and addresses that do not have them. To do this, you can use OrNull() method:

.RuleFor(c => c.Address2, f => f.Address.SecondaryAddress()
                                        .OrNull(f, .5f))

The OrNull method takes the faker object and a value between 0 and 1 to determine how often to generate a null value. In this example, we’re specifying that we want half (or 50%) of the Addresses to have a null for it’s secondary address.

Now that we have a faker that does what we want, let’s use it to generate addresses too!

public class CustomerFaker : Faker<Customer>
{
  AddressFaker _addrFaker = new AddressFaker();

	public CustomerFaker()
  {
    var id = 1;

    UseSeed(1969) // Use any number
      .RuleFor(c => c.Id, _ => id++)
      .RuleFor(c => c.CompanyName, f => f.Company.CompanyName())
      .RuleFor(c => c.ContactName, f => f.Name.FullName())
      .RuleFor(c => c.Phone, f => f.Phone.PhoneNumberFormat())
      .RuleFor(c => c.Address, _ => _addrFaker.Generate(1)
                                              .First()
                                              .OrNull(_, .1f));
  }
}

You can notice that we’re creating an instance of the AddressFaker and then using it when we specify the rule for the Customer’s Address property. We can even use OrNull to only generate Addresses for 90% of the customers.

There is a lot more to the Bogus library, but hopefully this will get you started. To get the example code from the video and this blog post, see the Github Repo:

FixingIt Code