Managing external URLs in HtmlRenderer

Jan 12, 2013 at 9:22 AM

Hello,
It seems HtmlRenderer does not render real web sites yet.
I've modified some parts of it to allow parse external web sites:

In "private static object DetectSource(string path, object bridge)" of "CssValueParser.cs":

           else if (Uri.IsWellFormedUriString(path, UriKind.RelativeOrAbsolute))
            {
                if (bridge != null) // using the bridge as the root web site's url
                {
                    var urlRoot = new Uri(bridge.ToString());
                    var uri = new Uri(path, UriKind.RelativeOrAbsolute);
                    // Make it absolute if it's relative
                    if (!uri.IsAbsoluteUri)
                    {
                        uri = new Uri(urlRoot, uri);
                    }
                    return uri;
                }
                return new Uri(path);
            }


And now modifying the "public static Image GetImage(string path, object bridge)" of "CssValueParser.cs" to read external image files from urls:

                else if(uri! = null)
                {
                    return Image.FromStream(new MemoryStream(new System.Net.WebClient().DownloadData(uri)));
                }


And also in "public static string GetStyleSheet(string path, object bridge)" method of  "CssValueParser.cs" to read external css files from urls:

                else if (source is Uri)
                {
                    return new System.Net.WebClient().DownloadString((Uri)source);
                }

Hello,

It seems HtmlRenderer does not render real web sites yet.

I've modified some parts of it to allow parse external web sites:

 

In "private static object DetectSource(string path, object bridge)" of "CssValueParser.cs":

 

           else if (Uri.IsWellFormedUriString(path, UriKind.RelativeOrAbsolute))

            {

                if (bridge != null) // using the bridge as the root web site's url

                {

                    var urlRoot = new Uri(bridge.ToString());

                    var uri = new Uri(path, UriKind.RelativeOrAbsolute);

                    // Make it absolute if it's relative

                    if (!uri.IsAbsoluteUri)

                    {

                        uri = new Uri(urlRoot, uri);

                    }

                    return uri;

                }

                return new Uri(path);

            }

 

And now modifying the "public static Image GetImage(string path, object bridge)" of "CssValueParser.cs":

 

                else if(uri! = null)

                {

                    return Image.FromStream(new MemoryStream(new System.Net.WebClient().DownloadData(uri)));

                }

 

And also in "public static string GetStyleSheet(string path, object bridge)" method of  "CssValueParser.cs":

 

                else if (source is Uri)

                {

                    return new System.Net.WebClient().DownloadString((Uri)source);

                }

Developer
Jan 14, 2013 at 10:00 AM

Indeed the Html Renderer doesn't support real web sites, and will never have full support for real web sites because the real web sites have redirects, java script, iframes, flash, etc. the goal for Html Renderer is to provide managed static html rendering.

But you are correct that it can have better support for images and stylesheets, though the way you added this support is problematic because now the rendering has dependency on network.

Html Renderer v1.3 has support for images referenced by URI that are downloaded in async (IO ports).

Adding support for stylesheet download is a bit more problematic, I may be adding it in the future.

Jan 14, 2013 at 10:31 AM

- Some of the url's in real web sites are relative (css/images). So using the the bridge as the root web site's url to make absolute uri's is necessary.

- An idea about handling dependencies: 

iTextSharp has a new html to pdf converter. It lets end users to implement their own "CssResolverPipeline" or custom "ImageProvider"s. HTML Renderer could use the same approach (at least using Func<string, string> for providing the custom CssResolvers or Func<string, Image> to allow define custom image resolvers)

Developer
Jan 14, 2013 at 11:29 AM

I think having to provide classes for custom handling is too complicated for the user of the code so I don't like this approach, I prefer events.

In V1.3 the is "ImageLoad" event allows the user to provide the image via Image object, URI or file path. so in case of relative URIs the user can just append the base URI and it will do the trick. The general event solution is more extensive as you can read here: http://theartofdev.wordpress.com/2013/01/13/html-renderer-1-3-0-0/).

Providing custom CSS currently can be done using the bridge, the demo project uses it so you can see it there.

I believe I will add something similar to image resolve for CSS resolve in future versions so it will be more powerful and consistent.

what do you think?

Jan 14, 2013 at 12:04 PM

Thanks. "ImageLoad" or "CssLoad" events are good ideas.