Python PDF Generation Made Easy with WeasyPrint

By michael

Published at: 7/25/2023, 9:44:37 PM

Python PDF Generation Made Easy with WeasyPrint

I recently embarked on an interesting journey to create a Resume Maker application using Python, Streamlit, and WeasyPrint. My application's core function was to generate a resume in PDF format from user-inputted data. However, I found myself caught in a thorny situation when my application started crashing due to memory issues.

Everything was going swimmingly, with the CSV version of the resume generating without any problems. However, the moment I incorporated WeasyPrint to create the PDF version, my application started crashing. I was baffled at first, seeing as how I was only generating a simple PDF document, and not doing anything particularly memory-intensive.

In my quest to solve this issue, I dove deep into the inner workings of my code, combing through every detail. It was then that I noticed something peculiar. The application would only crash when generating the PDF, and everything would work fine if I stuck to CSVs.

I turned to Google, and after searching for similar issues faced by other developers, I was led to the understanding that the problem wasn't with my code per se, but with how WeasyPrint was using memory. The memory used by WeasyPrint was not being properly released, leading to an increase in memory usage with each PDF generated, eventually causing the application to crash.

I learned that WeasyPrint keeps all generated PDFs in memory and doesn't release it, which was causing my application to run out of memory. However, I was not ready to give up on WeasyPrint just yet. I knew there must be a way to overcome this issue.

My breakthrough came when I stumbled upon an ingenious solution on GitHub, where a user had suggested using the 'subprocess' module to generate the PDF as a separate process. This approach would isolate the memory used by WeasyPrint, ensuring that it wouldn't impact the main application.

I quickly incorporated the solution into my code. I used 'subprocess.Popen' to generate the PDF, and it worked like a charm! The memory used by WeasyPrint was now properly isolated, and my application stopped crashing.

Here's the code snippet for the updated PDF generation function:

import shlex
import subprocess
from weasyprint import HTML


def generate_resume_pdf(data, filename, template): 
    html_string = f"""
    <!-- Your HTML string here -->
    """
    escaped_html_string = shlex.quote(html_string)
    process = subprocess.Popen(f"echo {escaped_html_string} | weasyprint -e utf-8 - {filename}", shell=True, stdout=subprocess.PIPE)
    process.communicate()

This solution not only fixed my memory issue but also taught me a valuable lesson about understanding and managing memory usage in Python applications.

So, if you're facing a similar issue with memory while using WeasyPrint, I hope this post can provide you with a solution. Remember, sometimes the issue isn't with your code, but with how the libraries you're using handle resources.

It's important to understand the tools you're using and how they work under the hood.

Happy coding!

https://github.com/RobertAKARobin/weasyprint-mem-py/commit/355aab3d3b090ca37f78f13ff89b2518c0b59031