yes

Generate HTML to PDF using AWS Lambda and Wkhtmltopdf

Wkhtmltopdf AWS Lambda Typescript

Almost every system I've ever worked with had to generate PDF files for reporting, statements etc. Finding a good (and free) library can be tricky, especially in the .NET Core space. Many options out there lack documentation and can have unexpected behaviour. Generating larger PDFs can put a lot of pressure on your system, so moving it into a separate service would make sense.

AWS Lambda fits the bill quite well, and it scales out by default and doesn't cost you a dime when sitting idle. In this blog post, we'll explore how to use AWS Lambda and Wkhtmltopdf to generate PDF files from HTML.

What is Wkhtmltopdf?

Wkhtmltopdf is an open-source command-line tool that converts HTML pages to PDF files. It uses the WebKit rendering engine to convert HTML content to PDF, which means it can handle complex layouts, CSS stylesheets, and JavaScript. Wkhtmltopdf is available for Linux, Windows, and macOS and can be used as a standalone command-line tool or as a library in other applications.

Show me the code

I've created an example AWS Lambda service here: https://github.com/hkarask/html-to-pdf-lambda

To deploy the service, run terraform init and adjust the function and bucket names in variables.tf. After that you're ready to deploy:

npm run build
terraform apply

This will provision a S3 Bucket with a sample HTML file and Lambda function, which you can invoke using the function URL:

curl -H 'Content-Type: application/json' \
  -d '{"uri": "https://www.google.com", "fileName": "sample.pdf"}' \
  -X POST $(terraform output -raw function_url) -i

Where uri is either an URL or S3 file key.

Available parameters are:

{
  "uri": "https://example.com", // URL or input S3 key
  "fileName": "converted.pdf" // Name of the converted file,
  "orientation": "Landscape", // Optional: Landscape or Portrait
  "marginTop:": "number", // Optional: top margin
  "marginRight": "number", // Optional: right margin
  "marginBottom": "number", // Optional: bottom margin
  "marginLeft": "number" // Optional: left margin
}

This then triggers the Lambda, passes the options to Wkhtmltopdf and saves the converted PDF to a configured S3 bucket.

HTTP/1.1 200 OK
Date: Sat, 11 Mar 2023 04:51:55 GMT
Content-Type: application/json

{"message":"File saved to lambda-html-to-pdf-files/sample.pdf"}

or you might get a validation error if you missed something:

HTTP/1.1 400 Bad Request
Date: Sat, 11 Mar 2023 04:53:33 GMT
Content-Type: application/json

{"message":"fileName not set"}

To monitor the Lambda logs, you can run:

aws logs tail "$(terraform output -raw lambda_log_group)" --follow

To easily copy the newly generated PDF to your local machine:

aws s3 cp "s3://$(terraform output -raw s3_bucket)/sample.pdf"

Conclusion

AWS Lambda and Wkhtmltopdf are powerful tools that can be used to generate PDF files from HTML content quickly and efficiently. With the ability to handle complex HTML layouts, CSS stylesheets, and JavaScript, Wkhtmltopdf is a flexible tool that can handle a wide range of use cases. And with the scalability and cost-effectiveness of AWS Lambda, you can generate PDF files from HTML content without having to worry about managing servers or paying for idle time.

However, it's important to note that there are some limitations to using AWS Lambda for generating PDF files. Because AWS Lambda functions have a maximum execution time of 15 minutes, you may run into issues if you're generating large PDF files or processing a high volume of requests. In addition, if you're generating PDF files that require a lot of CPU or memory resources, you may need to use a larger Lambda function or consider using a different service, such as Amazon EC2.

Overall, AWS Lambda and Wkhtmltopdf are a great combination for generating PDF files from HTML content, particularly for small to medium-sized PDF files and low to medium volumes of requests. By leveraging the power and flexibility of these tools, you can quickly and easily generate PDF files from HTML content, without having to worry about managing servers or paying for idle time.