PHP Garbage Collection and Performance

By Robert Dominy – Engineering Director – ADP Cobalt Display Advertising Platform

Normally garbage collection is not a big issue in PHP because garbage only accumulates under special circumstances and most PHP applications are short lived.  However, when the conditions are right, PHP garbage collection can cause seemingly random half second to full second delays in an application.

Garbage Accumulation

PHP uses fairly standard memory management techniques.  As you create new objects and assign them to variables, PHP increments reference counters.  As those variables go out of scope, PHP decrements the reference counters and when the counter goes to zero, the object is freed from memory.

The exception to this that causes garbage accumulation is when objects contain circular references.  The example below demonstrates a circular reference.  The Owner class has an array of Item objects and each Item has a reference to its owner.

class Item
{
            public $name='';
            protected $owner = null;

            public function __construct($name, $owner)
            {
                        $this->name = $name;
                        $this->owner = $owner;
            }
} 

class Owner
{
            protected $items = array();
            protected $last = null;           

            public function add($itemName)
            {
                        $this->items[$itemName] = new Item($itemName, $this);
            }           
}

Create a series of Items and later free up the Owner.

gc_disable();

$o = new Owner();
$o->add('first');
$o->add('second');
$o->add('third');

$o = null; 

$count = gc_collect_cycles();
print("$count items collected\n"); 

>> 8 items collected

PHP is unable to fully delete the objects when the owner is deleted because it detects the circular reference.  Instead the object is marked for garbage collection and analyzed during the next garbage cycle.  By turning off garbage collection at the start of the script and calling the collect method manually, you can see that PHP has collected garbage during execution.

Short Lived Applications vs. PHP Daemons

Most PHP applications are either running under Apache or executed from the command line as a short-lived script.  In these cases garbage collection is likely to have zero impact on performance.  The script will probably exit before garbage collection ever kicks in.

However, if you are operating a PHP daemon, executing a long running PHP script or your script generates a lot of garbage, then garbage collection can have a severe impact on performance.  In one of our high performance applications we were seeing hundreds of 500ms-1000ms random delays per hour.  After much hunting the cause was traced to garbage collection.  An “optimization” slipped in the code that caused a circular reference.

Mitigation

There are two ways you can mitigate the performance effects of garbage collection.  The first is to simply avoid circular references.  If you have no circular references the garbage collector will never kick in.

When you cannot avoid circular references or you want to put some defensive code in place to protect critical sections of code, you can at least choose when garbage collection occurs.   At the start of a critical code section disable garbage collection (gc_disable) and when you’ve finished all your critical work (for example handling a user request and returning the results), then either manually collect the garbage yourself (gc_collect_cycles) or re-enable garbage collection (gc_enable).

If you are not expecting the code to generate garbage calling gc_collect_cycles and logging any non-zero return values might help you catch code that sneaks into the project at a later date.

Effects of Linux Context Switching on High Performance Web Applications

By Robert Dominy – Engineering Director – ADP Cobalt Display Advertising Platform

If you are writing a high performance web application and attempting to evaluate performance, one thing you should be aware of is the effect of Linux context switching.  Under Linux, the system scheduler allocates time slices to running processes and as processes exceed their allocated slice, they can be interrupted to give time to other processes.  These context switches can be expensive when you measuring performance in the tens of milliseconds or less.

Example
On a production server that is moderately busy (handling about 100 HTTP requests/sec), a series of timing tests were conducted.  The server is a virtualized system running CentOS 6.2 with 4 CPUs and 32GB of RAM allocated to the virtual.  It runs at loads ranging from about 0.5 to 1.5.

A simple loop was implemented in PHP where the timed task was basically this:

$limit = 7000;
$iterations = 0;
while ($iterations < $limit)
{
    $iterations++;
    $last=microtime(true);
}

Here is a histogram of the timed results (in milliseconds):
0-10 ms : 0
11-18 ms : 1294
19-24 ms : 373
25-50 ms : 152
51-75 ms : 6

Nominally, the test completes in about 14ms and yet why are there cases as high as 65ms?

Measuring Context Switches
The Linux function getrusage returns different metrics about process resource usage.  The items we are interested in are user CPU time (ru_utime), system CPU time  (ru_stime) and involuntary context switches (ru_nivcsw).

Here are the stats for a sample that took 65ms:
[elapsed] => 65ms
[userCPUTime] => 8.999ms
[systemCPUTime] => 9.999ms
[switches] => 7

Combining the user and system CPU time, the process consumed about 19ms.  During this time there were 7 context switches where the process was interrupted and other processes were allowed to run.  Those interruptions added another 46ms to the completion of the test.

Another important stat to monitor is voluntary context switches (ru_nvcsw).  Voluntary context switches can occur when your code calls various operating system functions, such as the sleep function or I/O functions.

A basic PHP class that runs as a command line script can be found here: https://gist.github.com/rdominy/6557280

Server Activity
Running on a machine that has less activity will result in significantly less context switches.  This is often the case when running and testing machines in a development environment and then later testing them in production under real load.  As an example, compare the results of a production server vs. a development laptop:

Time Range Development Production
11 – 18 ms 0.0% 90.6%
19 – 24 ms 0.4% 7.6%
25 – 50 ms 99.6% 1.8%

Even though the production server is significantly faster, it has much more variation in times due to context switching.  Even though the CPU load between the systems is similar, the production server is handling hundreds of requests per second causing NIC interrupts and other CPU activity.

Memory Footprint
The amount of memory can also impact the cost of context switching.  Here are three tests running with foot prints of 1MB, 500MB and 900MB:

Time Range 1 MB 500MB 900MB
11 – 18 ms 84.6% 0.0% 0.0%
19 – 24 ms 12.1% 68.5% 70.2%
25 – 50 ms 3.2% 30.5% 29.3%
51 – 75 ms 0.1% 0.9% 0.4%
76 – 100 ms 0.0% 0.2% 0.1%

Simply having a larger process size increased the time of the same test by almost twofold.

Caveats
The elapsed time for these tests uses the PHP microtime function.  It is subject to limitations of floating point accuracy, clock drift caused by NTP corrections and probably several other things I am not considering.  I did monitor the NTP logs and averaged about 1 NTP correction hour, so NTP corrections were not likely influential on the test results.

Conclusion
Additional tests of varying complexity, typically running a real application algorithm, yielded similar results. Ultimately the slower your application is and the more memory it uses, the more costly it is for context switching.  Super fast sub-millisecond operations in processes with a small footprint will have much less interruptions than functions that take tens of milliseconds to complete.

Future posts will look at ways of mitigating these costs.

%d bloggers like this: