Unicode hindi text not showing in PDF

Nov 27, 2014 at 4:07 AM
Hi,

First - thanks a lot for this wonderful library.

I have run into a peculiar problem. I have developed an ASP.Net application that requires generating PDF from an HTML. This is where your HTML Renderer comes into picture.

The HTML contains Unicode text (Hindi text). For this I am using MS Arial Unicode font. My application runs fine on Win8 desktop (64bit), and generates PDF that properly shows the hindi text. When I deploy the same application on a Windows 2012 server (std) (64bit), the generated PDF shows rectangular boxes in place of hindi text. I have double checked - the server does have MS Arial unicode font installed on it.

In either scenario, I am not storing the PDF to a disk file, instead I am streaming it to the response.

Here is the code fragment I am using to convert HTML to PDF:
using TheArtOfDev.HtmlRenderer.Core.Entities;
using TheArtOfDev.HtmlRenderer.PdfSharp;

PdfGenerateConfig config = new PdfGenerateConfig();
config.PageSize = PdfSharp.PageSize.A4;
config.SetMargins(20);

PdfSharp.Pdf.PdfDocument doc = PdfGenerator.GeneratePdf(html, config, null, null, null);

using (MemoryStream stream = new MemoryStream())
{
   doc.Save(stream, true);
   byte[] pdfBytes = stream.ToArray();
   return pdfBytes;
}
Here is code fragment that I am using to stream the PDF:
string html="some html text";  // GetReportHtml();
byte[] pdfBytes = PdfGenerator.GetPdfBytes(html);

string fileName = "test.pdf";
HttpResponseMessage response = Request.CreateResponse(HttpStatusCode.OK);

MemoryStream pdfStream = new MemoryStream(pdfBytes);
response.Content = new StreamContent(pdfStream);

response.Content.Headers.ContentType = new MediaTypeHeaderValue("application/pdf");
response.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("inline");
response.Content.Headers.ContentDisposition.FileName = fileName;

return response;
Any suggestions/ideas regarding what could be causing this?

Regards,
Vikram
Nov 27, 2014 at 5:11 AM
This appears to be an issue related to PDF alone.

When I tried the demo application on the Win 2012 server, the demo application was displaying hindi text correctly. When I converted it to PDF - I again got the rectangular boxes.

Screenshot of demo application

Image

Screenshot of PDF generated from demo application

Image

Unfortunately converting image to PDF is not an option that would work for my requirements.

Any suggesting?

Thanks,

Vikram
Developer
Nov 27, 2014 at 5:13 PM
looks like PdfSharp issue, please try to use pdf sharp directly and if so post to PdfSharp page.
Dec 3, 2014 at 2:14 PM
Thanks Arthur. Yes, it does appear to be a PDFSharp issue.

It appears that PDFSharp cannot handle hindi font :-(

Now if I want to go with HTMLRenderer->image->PDF, how do I achieve pagination. Any suggestions?

Thanks,

Vikram
Developer
Dec 5, 2014 at 10:34 AM
check the code for pagination in PdfGenerator.cs it should be easy to convert it into image generation per page (basically a game of setting ScrollOffset).
Note, the quality of the PDF is lower when the content is image.

Also, if you have the time and it is important enough for you, you may check iTextSharp, using it instead of PdfSharp shouldn't be to hard to achieve.

Best of luck.
Dec 7, 2014 at 9:41 AM
Edited Dec 7, 2014 at 9:42 AM
Thanks Arthur.

I will try that.

I also see this post Thinking of Page Support? which could be useful.

Vikram