Processor (CPU) | | |
Processor\% Processor Time | The percentage of time that the processor spends active, and the percent of processing capacity being used by the processor. Note that this is the same counter as Processor Information > Processor Time | Less than 85% on average. Note that this is a general measurement of how busy the system is, and it is expected for the CPU to remain while busy; however, if pegged at almost 100% utilization and all other metrics are low, then you might be CPU bound and should consider investing in a more performant system. |
Processor\% User Time | This counter is reflective of what the CPU is doing on behalf of applications, such as looping through an array or running functions within the application itself that don’t involve the system like writing a file to disk (which would fall under privileged time). | Less than 85% on average. User Time and Privileged Time should be looked at as a unit. If PT is consistently higher than UT and the application is performing poorly then it is possible that the CPU is all tied up trying to handle privileged requests that may or may not be tied to the specific application being monitored. |
Processor\% Privileged Time | This counter measures the % of CPU utilization dedicated to handling system-oriented tasks that are of higher “privilege” than user (or application) oriented tasks. Generally, the combination of privileged and user time will equal the total processor time. | Less than 85% on average. User Time and Privileged Time should be looked at as a unit. If PT is consistently higher than UT and the application is performing poorly then it is possible that the CPU is all tied up trying to handle privileged requests that may or may not be tied to the specific application being monitored. |
Process
(the app) | |
Process (csftps.exe)\ % Privileged, Processor, User Time | This is the same as the above three measurements, however it isolates the measurement of CPU utilization so that it is strictly associated with the EFT server service executable. As such it will be a subset of the overall process. | Less than 85% on average. Keep in mind these are a subset of the three measurements that are taken for the entire system. The reason these are helpful is in case you want to isolate whether EFT is consuming the majority of resources or some other application, such as an AV tool running in the background. |
Process (csftps.exe)\ Handle Count, Thread Count | These two values are distinct but related. A thread is a set of separate, sequential set of instructions executed by the CPU on behalf of the application. Handles are a logical associated with a resource, such as a file, memory location, or dialog. A thread is typically used to open or obtain a handle to said resource. | Steady values. Thread counts increasing with utilization is normal, as is an increase in handles.; however, if handles or threads are increasing in an unbounded fashion over time, then EFT could be experiencing a memory leak. Note that a large number of threads or handles (even in the tens of thousands) is ok. It is the constant increase with no decrease over time even when server utilization fluctuates or drops that should raise a red flag. |
Process (csftps.exe)\ Private Bytes | This is generally (with many exceptions) a value that can be associated with how much memory an application is consuming. | Less than 2GB. Note that there are many factors in determining both memory consumption and/or memory leaks. An increase in memory as utilization increase is to be expected; however unbounded increase or memory utilization associated with csftps.exe exceed 2GB should be looked at. |
System | | |
System\ Processor Queue Length | Shows the number of threads waiting to be serviced by the processor. Waiting threads translates directly into slower performance. | No greater than 5 times the number of processors running, on average. Take the number shown and divide by the number of logical processors. If that number is greater than 5, then more processing power might be needed. Google “Processor Queue Length” for in-depth analysis of this metric. |
Disk | | |
Physical Disk\ % Idle Time | Amount of time your disks are idle or not performing any action. You can also use % Disk Write Time and % Disk Read Time or just %Disk Time to assess the opposite of idle time. Generally, you don’t need all four. IMPORTANT: While _Total is a valid instance, you should select the actual physical disk that is being utilized. E.g. “c:\” | Greater than 85%, on average. If %Idle time falls below %20 and stays there then it is in constant read or write mode. Couple this measurement with others such as disk queue length and read/writes a second (measured against the disk’s operational specs) to determine if the disk is a bottleneck. |
Physical Disk\ Disk Reads /sec and Disk Writes/sec | Overall rate of read and/or write operations on the disk (Can be used to determine IOP’s to evaluate hardware needs and as a benchmark for hardware upgrades.) | Less than 80%. This value is typically the opposite of %Idle Time. Keep in mind that I/O will be high during high load situations. |
Physical Disk\ Current and Average Disk Queue Length | Current Disk Queue Length is a snapshot of queued of requests for either read or write at the time when a measurement is taken. The result can be a bit misleading which is why you also want to look at Average Disk Queue length, which derives an average of values between measurement intervals. | Calculating a disk bottleneck off of these numbers is difficult. If back to back measurements of Current Disk Queue Length are the same, then Average Disk Queue Length can be used to measure outstanding I/O requests (otherwise it cannot). It is best to have someone with expertise evaluate these results. |
Physical Disk\ Avg. Disk Sec/Read and Write Avg | This is a measurement of the average time it takes in seconds to read (or write) from/to disk. Note that the latency measured is the time it takes from when the partition manager receives the i/o request to the time it completes. | Less than 20. This value is calculated with millisecond precision (the default multiplier is 1000). A value of “5” shown in the log is .005 of a second. If the value increases under load to where 10s of milliseconds latency is detected, on average, it could signify a slowness beneath the partition manager (class driver, or port driver, or device miniport driver, or disk subsystem) |
Physical Disk\ Disk Bytes\sec | Measures the disk I/O both read and write | Less than system specs for that disk’s max throughput. There is no specific number to look for, but rather a comparison between the average bytes in the I/O compared to what the disk subsystem is actually capable of. |
Others: | Split IO/Sec can be useful for detecting a heavily fragmented disk. %Free space is useful in case you didn’t realize you were running out of space (especially when measured over time. | |
Memory | | |
Memory\ Available Mbytes | The amount of free memory. | Less than 80% utilization. If higher and sustained then look into increasing they system’s memory. |
Memory\ % Committed Bytes in Use | This is the ratio of Committed Bytes to the Commit Limit | Less than 80% utilization. If higher and sustained then look into increasing they system’s memory. |
Network | | |
Network Interface\Bytes Total/Sec | This counter simply measures the overall (inbound and outbound) bytes transferred over the wire at the moment in time the snapshot was taken. When adding this counter, be sure to specify the correct network interface, or just specify all if you aren’t sure which one is being utilized. | Less than 70% utilization, on average. To determine utilization, you must first determine what your available bandwidth and NIC is capable of. Also, the total bytes should be multiplied by 8 to get the Bits per second, as most measurements for throughput will be in bps, no Bps. To determine utilization use this formula: Utilization = ((Total Bytes\Sec * 8)/current bandwidth in bps)*100). During high loads this number may reach saturation thresholds if all other resources are not maxed out. If it does, then bandwidth could be your bottleneck. |
EFT Server Counters | | |
ARM Queue Size | Measures the database inserts currently queued up waiting for SQL (or Oracle) | Less than 10,000 on average on a high load server. An occasional spike in queue size is not necessarily a problem; however sustained high numbers in the hundreds of thousands or a growing queue size could indicate a problem with the database server not having the resources to handle the volume of traffic EFT is throwing its way. Note: If the number is pegged at 1,000, then you may need to apply the advanced property in EFT to override the default max allowed queue size (1,000). Change that number to 500,000 or similar to get a better reading from Perfmon. |
Connected Admin Count | Shows the count of currently connected admins. | Less than 10 per server node. A large number of concurrently connected admins could result in performance slowdowns as EFT fights to keep configuration changes from stepping all over each other. Ideally you would have no more than a half-dozen privileged admins or a larger set but that are allocated specific (lesser) admin roles, to avoid conflict. |
Workspaces Licenses Used | Measures the number of Workspaces current allocated and not expired. This can be useful for determining whether Workspaces are growing at an unbounded rate by heavy user use of the same. | Less than 100,000 by server node. Once this number grows into the tens or hundreds of thousands, EFT can get bogged down as it attempts to manage these resources, such as routing checking for which ones are expired. |
EFT Site Counters | | |
All | Each counter measures something that can be useful depending on the troubleshooting situation. | No expected values to measure; however keep an eye on AWE actions queue size as a growing queue could indicate that your max allowed AWE objects and threads is set to low (a set of advanced properties), thus resulting in backed up AWE workflows that could slow down EFT if that queue grows too large. |
SQL Counters | | |
Various | Search the web for which counters to measure. Links provided below | If troubleshooting your SQL server (for example, you are trying to determine why EFT’s ARM queue size is growing too large), then there are a number of counters you can run that are specific to the SQL application. Those fall outside the scope of this doc. |