Troubleshooting build

Last Update: 5/1/2017

Team Services | TFS 2017 | TFS 2015 | Previous versions: XAML Build, Release

Run commands locally at the command prompt

It is helpful to narrow whether a build failure is the result of a TFS/VSTS product issue (agent or tasks). Build failures may also result from external commands.

Check the build log for the exact command-line executed by the failing step. Attempting to run the command locally from the command line, may reproduce the issue. It can be helpful to run the command locally from your own machine, and/or log-in to the build machine and run the command as the service account.

For example, is the problem happening during the MSBuild part of your build process (for example, are you using either the MSBuild or Visual Studio Build step)? If so, then try running the same MSBuild command on a local machine using the same arguments. If you can reproduce the problem on a local machine, then your next steps are to investigate the MSBuild problem.

Differences between local command prompt and agent

Keep in mind, some differences are in effect when executing a command from the local command, and when a build is running on an agent. If the agent is configured to run as a service on Windows/Linux, then it is not running within an interactive logged-on session. Without an interactive logged-on session, UI interaction and other limitations exist.

Get logs to diagnose problems

Build logs

Start by looking at the logs in your completed build. If they don't provide enough detail, you can make them more verbose:

  1. On the Variables tab, add system.debug and set it to true. Select to allow at queue time.

  2. Queue the build.

  3. In the explorer tab, view your completed build and click the build step to view its output.

  4. If you need a copy of all the logs, click Download all logs as zip.

Diagnostic logs

  1. Log on to the agent machine.

  2. Go to the _diag subfolder in the directory where the build agent is installed. For example: c:\agent\_diag

Worker diagnostic logs

You can get the diagnostic log of the completed build that was generated by the worker process on the build agent. Look for the worker log file that has the date and time stamp of your completed build. For example, worker_20160623-192022-utc_6172.log.

Agent diagnostic logs

Agent diagnostic logs provide a record of how the agent was configured and what happened when it ran. Look for the agent log files. For example, agent_20160624-144630-utc.log. There are two kinds of agent log files:

  • The log file generated when you ran config.cmd. This log:

    • Includes this line near the top: Adding Command: configure

    • Shows the configuration choices made.

  • The log file generated when you ran run.cmd. This log:

    • Cannot be opened until the process is terminated.

    • Attempts to connect to your Team Foundation Server or Team Services account.

    • Shows when each job was run, and how it completed

Both logs show how the agent capabilities were detected and set.

HTTP trace logs

Important: HTTP traces and trace files can contain passwords and other secrets. Do not post them on a public sites.

Use built-in HTTP tracing

If your agent is version 2.114.0 or newer, you can trace the HTTP traffic headers and write them into the diagnostic log. Set the VSTS_AGENT_HTTPTRACE environment variable before you launch the agent.listener.

Windows:
    set VSTS_AGENT_HTTPTRACE=true

OSX/Linux:
    export VSTS_AGENT_HTTPTRACE=true

Use full HTTP tracing

Windows
  1. Start Fiddler.

  2. We recommend you listen only to agent traffic. File > Capture Traffic off (F12)

  3. Enable decrypting HTTPS traffic. Tools > Fiddler Options > HTTPS tab. Decrypt HTTPS traffic

  4. Let the agent know to use the proxy:

    set VSTS_HTTP_PROXY=http://127.0.0.1:8888
    
  5. Run the agent interactively. If you're running as a service, you can set as the environment variable in control panel for the account the service is running as.

  6. Restart the agent.

OSX and Linux

Use Charles Proxy (similar to Fiddler on Windows) to capture the HTTP trace of the agent.

  1. Start Charles Proxy.

  2. Charles: Proxy > Proxy Settings > SSL Tab. Enable. Add URL.

  3. Charles: Proxy > Mac OSX Proxy. Recommend disabling to only see agent traffic.

    export VSTS_HTTP_PROXY=http://127.0.0.1:8888
    
  4. Run the agent interactively. If it's running as a service, you can set in the .env file. See nix service

  5. Restart the agent.

File- and folder-in-use errors

File or folder in use errors are often indicated by error messages such as:

Access to the path [...] is denied.
The process cannot access the file [...] because it is being used by another process.
Access is denied.
Can't move [...] to [...]

Detect files and folders in use

On Windows, tools like Process Monitor can be to capture a trace of file events under a specific directory. Or, for a snapshot in time, tools like Process Explorer or Handle can be used.

Anti-virus exclusion

Anti-virus software scanning your files can cause file or folder in use errors during a build. Adding an anti-virus exclusion for your agent directory and configured "work folder" may help to identify anti-virus software as the interfering process.

MSBuild and /nodeReuse:false

If you invoke MSBuild during your build, make sure to pass the argument /nodeReuse:false (short form /nr:false). Otherwise MSBuild process(es) will remain running after the build completes. The process(es) remain for some time in anticipation of a potential subsequent build.

This feature of MSBuild can interfere with attempts to delete or move a directory - due to a conflict with the working directory of the MSBuild process(es).

The MSBuild and Visual Studio Build tasks already add /nr:false to the arguments passed to MSBuild. However, if you invoke MSBuild from your own script, then you would need to specify the argument.

MSBuild and /maxcpucount:[n]

By default the build steps such as MSBuild and Visual Studio Build run MSBuild with the /m switch. In some cases this can cause problems such as multiple process file access issues.

Try adding the /m:1 argument to your build steps to force MSBuild to run only one process at a time.

File-in-use issues may result when leveraging the concurrent-process feature of MSBuild. Not specifying the argument /maxcpucount:[n] (short form /m:[n]) instructs MSBuild to use a single process only. If you are using the MSBuild or Visual Studio Build tasks, you may need to specify "/m:1" to override the "/m" argument that is added by default.

Intermittent or inconsistent MSBuild failures

If you are experiencing intermittent or inconsistent MSBuild failures, try instructing MSBuild to use a single-process only. Intermittent or inconsistent errors may indicate that your target configuration is incompatible with the concurrent-process feature of MSBuild. See MSBuild and /maxcpucount:[n]

Process hang

Waiting for Input

A process hang may indicate that a process is waiting for input.

Running the agent from the command line of an interactive logged on session may help to identify whether a process is prompting with a dialog for input.

Running the agent as a service may help to eliminate programs from prompting for input. For example in .Net, programs may rely on the System.Environment.UserInteractive Boolean to determine whether to prompt. When running as a Windows service, the value is false.

Process dump

Analyzing a dump of the process can help to identify what a deadlocked process is waiting on.

WiX project

Building a WiX project when custom MSBuild loggers are enabled, can cause WiX to deadlock waiting on the output stream. Adding the additional MSBuild argument /p:RunWixToolsOutofProc=true will workaround the issue.

Agent connection issues

Config fails while testing agent connection (on-premises TFS only)

Testing agent connection.
VS30063: You are not authorized to access http://<SERVER>:8080/tfs

If the above error is received while configuring the agent, log on to your TFS machine. Start the Internet Information Services (IIS) manager. Make sure Anonymous Authentication is enabled.

iis tfs anonymous authentication enabled

Agent lost communication

This issue is characterized by the error message:

The job has been abandoned because agent did not renew the lock. Ensure agent is running, not sleeping, and has not lost communication with the service.

This error may indicate the agent lost communication with the server for a span of several minutes. Check the following to rule out network or other interruptions on the agent machine:

  • Verify automatic updates are turned off. A machine reboot from an update will cause a build to fail with the above error. Apply updates in a controlled fashion to avoid this type of interruption. Before rebooting the agent machine, the agent should first be marked disabled in the pool administration page and let any running build finish.
  • Verify the sleep settings are turned off.
  • If the agent is running on a virtual machine, avoid any live migration or other VM maintenance operation that may severly impact the health of the machine for multiple minutes.
  • If the agent is running on a virtual machine, the same operating-system-update recommendations and sleep-setting recommendations apply to the host machine. And also any other maintenance operations that several impact the host machine.
  • Performance monitor logging or other health metric logging can help to correlate this type of error to constrained resource availability on the agent machine (disk, memory, page file, processor, network).
  • Another way to correlate the error with network problems is to ping a server indefinitely and dump the output to a file, along with timestamps. Use a healthy interval, for example 20 or 30 seconds. If you are using VSTS, then you would want to ping an internet domain, for example bing.com. If you are using an on-premises TFS server, then you would want to ping a server on the same network.
  • Verify the network throughput of the machine is adequate. You can perform an online speed test to check the throughput.
  • If you use a proxy, verify the agent is configured to use your proxy. Refer to the agent deployment topic.

Builds not starting

TFS Job Agent not started

This may be characterized by a message in the web console "Waiting for an agent to be requested". Verify the TFSJobAgent (display name: Visual Studio Team Foundation Background Job Agent) Windows service is started.

Misconfigured notifcation URL (1.x agent version)

This may be characterized by a message in the web console "Waiting for console output from an agent", and the build eventually times out.

A mismatching notification URL may cause the worker to process to fail to connect to the server. See Team Foundation Administration Console, Application Tier. The 1.x agent listens to the message queue using the URL that it was configured with. However, when a job message is pulled from the queue, the worker process uses the notification URL to communicate back to the server.

Git

Get sources fails with SSL certificate problem (on-premises TFS and Windows agent only)

We ship git.exe as part of windows agent, we use this git.exe for all Git related operation. When you have a self-signed SSL certificate for your on-premises TFS server, make sure to configure the git.exe we shipped to allow that self-signed SSL certificate. The most reliable way might be to set the following git config in global level by the agent's run as user.

git config --global http."https://tfs.com/".sslCAInfo certificate.pem

Setting system level git config is not reliable on Windows, since the system level .gitconfig file is stored at the copy of git.exe we packaged which will get replaced whenever the agent is upgraded to a new version.

Team Foundation Version Control (TFVC)

Get sources not downloading some files

This may be characterized by a message in the build log "All files up to date" from the tf get command. Verify the built-in build service identity has permission to download the sources. Either the identity Project Collection Build Service or Project Build Service will need permission to download the sources, depending on the selected authorization scope on General tab of the build definition. In the version control web UI, you can browse the project files at any level of the folder hierarchy and check the security settings.

I need more help. I found a bug. I've got a suggestion. Where do I go?

Get subscription, billing, and technical support

Please submit bugs through Connect.

We welcome your suggestions: