Skip to content

CSV file splitting

This page is dedicated to a comprehensive guide on how we split CSV variables configured to be shared. More info on CSV variables is available here.

Uniqueness

The first thing to understand on CSV splitting is that we need to provide a unique set of values to each load generator. It is the only way we can guarantee that each user has access to a unique value because the underlying JMeter will distribute values at load generator level.

Another important notion is that each user running will pick a new value every time he runs a new iteration. You can work around this by using Flow control actions, but if you need each user to have a unique value, you want the last user to start before the first ones start their next iteration. Unless of course you have more values in your CSV than the number of users running.

Load generators

So the first step is to assess how many load generators will be required for a test. We do that based on several criterias, mostly the number of concurrent users. If you are using your own On premise agents, check the provider configuration to understand how it works.

For instance when running a test with the following configuration:

  • Login with 500 concurrent users from Paris,
  • Login with 2 000 concurrent users from California,
  • Create account with 500 concurrent users from Sidney.

We will most likely end up using 4 load generators:

CSV Runtime

CSV Splitting

Now let's say we have a login.csv file and we want each user to pick a unique value during the test. To do that we configure it to shared on the CSV configuration page.

Now you might think that we can split the file in 4, send a each load generator a share and proceed with the test. This is when we run into a second issue, because out of the 4 load generators, only 3 of them run the login virtual user.

Generally speaking, at this stage, we need to assess out of all the load generators how many of them will use a particular file. We do this by scanning the VU for each column name configured in the CSV and when we find one we give a share of this CSV to the load generators running it. This means that if a CSV is in use by several different VU profiles we can still guarantee unique values to each one of them. Or course if you don't want that, you can just duplicate your CSV variables instead.

In our earlier example we need to split the CSV file in 3 shares. We do a modulo on the file, in our case picking 1 line out of 3 for each subset:

CSV Splitter

Warning

Each subset is of equal size. Since we cannot accurately predict how many users will end on each load generator they all get the same amount of values. That can be misleading if you have very different number of users running. For example 10 users on one side and 1000 on the other will get both half of the file. You can compensate for this by providing more values.

CSV file matching

The only step remaining is to send each load generator its share of the file:

CSV Matching

And in the background we edit each CSV variable to pick the new splitted file over the original CSV.