The easiest way to check the content of a web page is to download it locally (to a file) and search through it. To accomplish this, we look at the URLDownloadToFile function. The article then looks at the architecture of IntelliLink.



Have you ever owned a website? Did you do some sysadmin work for somebody else? Have you made link exchange? If so, you probably wish to monitor the internal/external links from/to your site.


The easiest way to check the contents of a web page is to download it locally (to a file) and search through it. To accomplish this, we use URLDownloadToFile function, which has the following syntax:

HRESULT URLDownloadToFile(
	LPCTSTR szFileName,
	_Reserved_  DWORD dwReserved,


  • pCaller: A pointer to the controlling IUnknown interface of the calling ActiveX component, if the caller is an ActiveX component. If the calling application is not an ActiveX component, this value can be set to NULL. Otherwise, the caller is a COM object that is contained in another component, such as an ActiveX control in the context of an HTML page. This parameter represents the outermost IUnknown of the calling component. The function attempts the download in the context of the ActiveX client framework, and allows the caller container to receive callbacks on the progress of the download.
  • szURL: A pointer to a string value that contains the URL to download. Cannot be set to NULL. If the URL is invalid, INET_E_DOWNLOAD_FAILURE is returned.
  • szFileName: A pointer to a string value containing the name or full path of the file to create for the download. If szFileName includes a path, the target directory must already exist.
  • dwReserved: Reserved. Must be set to 0.
  • lpfnCB: A pointer to the IBindStatusCallback interface of the caller. By using IBindStatusCallback::OnProgress, a caller can receive download status. URLDownloadToFile calls the IBindStatusCallback::OnProgress and IBindStatusCallback::OnDataAvailable methods as data is received. The download operation can be canceled by returning E_ABORT from any callback. This parameter can be set to NULL if status is not required.

Return Value

This function can return one of these values:

  • S_OK: The download started successfully.
  • E_OUTOFMEMORY: The buffer length is invalid, or there is insufficient memory to complete the operation.
  • INET_E_DOWNLOAD_FAILURE: The specified resource or callback interface was invalid.

So, our implementation using the above function will be:

BOOL ProcessHTML(CString strFileName, CString strSourceURL, 
                 CString strTargetURL, CString strURLName)
   CString strURL;
   CString strFileLine;
   CString strLineMark;
   BOOL bRetVal = FALSE;

      CStdioFile pInputFile(strFileName, CFile::modeRead | CFile::typeText);
      while (pInputFile.ReadString(strFileLine))
         int nIndex = strFileLine.Find(_T("href=", 0);
         while (nIndex >= 0)
            const int nFirst = strFileLine.Find(_T('\"'), nIndex);
            if (nFirst >= 0)
               const int nLast = strFileLine.Find(_T('\"", nFirst + 1);
               if (nLast >= 0)
                  strURL = strFileLine.Mid(nFirst + 1, nLast - nFirst - 1);
                  if (strURL.CompareNoCase(strTargetURL) == 0)
                     TRACE(_T("URL found - %s\n"), strTargetURL);
                     strLineMark.Format(_T(">%s<"), strURLName);
                     if (strFileLine.Find(strLineMark, nLast + 1) >= 0)
                        TRACE(_T("Name found - %s\n"), strURLName);
                        bRetVal = TRUE;
            nIndex = (nFirst == -1) ? -1 : strFileLine.Find(_T("href=", nFirst + 1);
   catch (CFileException* pFileException)
      TCHAR lpszError[MAX_STR_LENGTH] =  0 ;
      pFileException->GetErrorMessage(lpszError, MAX_STR_LENGTH);
      bRetVal = FALSE;
   return bRetVal;

BOOL CLinkData::IsValidLink()

   BOOL bRetVal = TRUE;
   TCHAR lpszTempPath[MAX_STR_LENGTH] =  0 ;
   TCHAR lpszTempFile[MAX_STR_LENGTH] =  0 ;

   const DWORD dwTempPath = GetTempPath(MAX_STR_LENGTH, lpszTempPath);
   lpszTempPath[dwTempPath] = '\0';
   if (GetTempFileName(lpszTempPath, _T("html", 0, lpszTempFile) != 0)
      TRACE(_T("URLDownloadToFile(%s)...\n"), GetSourceURL());
      if (URLDownloadToFile(NULL, GetSourceURL(), lpszTempFile, 0, NULL) == S_OK)
         if (!ProcessHTML(lpszTempFile, GetSourceURL(), GetTargetURL(), GetURLName()))
            TRACE(_T("ProcessHTML(%s) has failed\n"), lpszTempFile);
            bRetVal = FALSE;
         TRACE(_T("URLDownloadToFile has failed\n"));
         bRetVal = FALSE;
      TRACE(_T("GetTempFileName has failed\n"));
      bRetVal = FALSE;
   return bRetVal;

The Architecture

What do Source URL, Target URL, and URL Name mean in the above piece of code?

  • Source URL = what web page to check
  • Target URL = what link should be on the above web page
  • URL Name = what name should be for the above link

Each URL definition is contained in a CLinkData class, with the following interface:

  • DWORD GetLinkID(); – Gets ID of the current URL definition
  • void SetLinkID(DWORD dwLinkID); – Sets ID for the current URL definition
  • CString GetSourceURL(); – Gets Source URL for current URL definition
  • void SetSourceURL(CString strSourceURL); – Sets Source URL for current URL definition
  • CString GetTargetURL(); – Gets Target URL for current URL definition
  • void SetTargetURL(CString strTargetURL); – Sets Target URL for current URL definition
  • CString GetURLName(); – Gets URL Name for current URL definition
  • void SetURLName(CString strURLName); – Sets URL Name for current URL definition
  • int GetPageRank(); currently not implemented
  • void SetPageRank(int nPageRank); currently not implemented
  • BOOL GetStatus(); – Gets status for current URL definition
  • void SetStatus(BOOL bStatus); – Sets status for current URL definition

Then, we define CLinkList as typedef CArray<CLinkData*> CLinkList;.

This list is managed inside the CLinkSnapshot class, with the following interface:

  • BOOL RemoveAll(); – Removes all URL definitions from list
  • int GetSize(); – Gets the size of URL definition list
  • CLinkData* GetAt(int nIndex); – Gets an URL definition from list
  • BOOL Refresh(); – Updates the status for each URL definition from list
  • CLinkData* SelectLink(DWORD dwLinkID); – Searches for a URL definition by its ID
  • DWORD InsertLink(CString strSourceURL, CString strTargetURL, CString strURLName, int nPageRank, BOOL bStatus); – Inserts a URL definition into list
  • BOOL DeleteLink(DWORD dwLinkID); – Removes an URL definition from list
  • BOOL LoadConfig(); – Loads the URL definition list from XML file
  • BOOL SaveConfig(); – Saves the URL definition list to XML file

The Good, the Bad, and the Ugly

The good thing is that I learned to use Windows ribbons. The bad thing is that I still don’t know how to get a web page’s PageRank value. The ugly thing is that the processing (i.e., checking link validity) should be done in a separate working thread, but I am planning this change for the next release. Stay tuned!

Final Words

IntelliLink application uses many components that have been published on Code Project. Many thanks to:

  • My CMFCListView form view (see source code)
  • Jerry Wang for his CXml CXmlNode CXmlNodes classes
  • PJ Naughter for his CTrayNotifyIcon class
  • PJ Naughter for his CVersionInfo class

Further plans: I would like to add support for Google’s PageRank as soon as possible.


  • Version 1.04 (November 9th, 2014) – Initial release.
  • Moved source code from CodeProject to GitLab (April 10th, 2020).
  • Moved source code from GitLab to GitHub (February 23rd, 2022).
  • Version 1.05 (April 28th, 2022) – Added setup project.
  • Version 1.06 (May 23rd, 2022) – Added program to Startup Apps.
  • Version 1.07 (August 20th, 2022) – Updated font size of About dialog.
  • Version 1.08 (August 26th, 2022) – Removed program from Startup Apps.

By Admin