Automatic Reporting in Python – Part 3: Packaging It Up

Automatic Reporting in Python (3 Part Series)

1 Automatic Reporting in Python – Part 1: From Planning to Hello World
2 Automatic Reporting in Python – Part 2: From Hello World to Real Insights
3 Automatic Reporting in Python – Part 3: Packaging It Up

As outlined in my previous posts (Part I and Part II available on this fine website), the goal of this project is to make an automatic reporting tool.

In this series of guides, the outcome I’m shooting for is a single HTML page that allows me to interrogate and compare the output of machine learning models.

At the conclusion of the previous tutorial, the reporting tool was actually showing some genuine use! It could accept a number of .csv summaries of machine learning summary files and output a single .html page that presented the information is a form that was… functional.

There three main features that I’d like to add to the tool for now:

  1. The report looks incredibly dull. Readability counts! We need to improve the a e s t h e t i c s of the report.
  2. It’s hard to dig into the tables – currently it’s just 100 rows presented with no tools to search. Some basic search functionality would be grouse.
  3. The datasets that we want to run the report with are currently hard-coded into the script. This needs to be split out to make the tool slightly more flexible.

A brief comment

Compared to the previous posts, this post is much more of an exploratory, learning experience – this post represents a neophyte’s attempt to build a working tool, rather than a beautiful depiction of all that is possible. If you can see a much better approach for anything listed in this post, please feel free to share it with myself and all the other readers!

But without further ado, let’s take a crack at improving the aesthetics.


Step Seven – Improving the Aesthetics

Those of you with an understanding of HTML pages might know what comes next: Cascading Style Sheets, known more commonly as CSS.

Taking your first steps down a path

As indicated above, the challenge in the context of this tutorial is that this is an enormous field that I personally am not actually particularly well-versed in, having only dabbled and hacked in this space. However, I am familiar with learning new things.

So, if this is your first real introduction to CSS, let me take you down the same path I would recommend in learning any new tech:

  • Do some background reading on the fundamentals of CSS. Hit up Wikipedia. If you’re keen, hit up the CSS standard! Never be afraid to Google “simple introduction to [topic].”
  • As you read and explore, make note of potential good resources of future information. I would encourage everyone to take a look for an “Awesome List” relevant to your topic – and in case, the Awesome CSS List is here.
  • With a basic understanding of what the hell is going on under your belt, play around to your heart’s content with local files and implement as much as you can in local scratch files. In this case, create small (or large?) HTML pages and figure out how to structure the CSS neatly. I find this really helps to get to know some of the practical realities and challenges of working with the language before I start diving into frameworks.

Being a bit more mercenary

For self improvement, I find nothing beats spending the time working up solutions from scratch. Of course, if you’re trying to hack together a solution for a business need, it may not behoove you to spend hours and days coming up with an elegant CSS framework from the beginning.

Instead, we may like to quickly jump off of someone else’s CSS framework. Fortunately, there are a number of these available, with a great number of them focusing on being ‘minimal’. A quick Google search should get you started down this path.

Integration

How is the CSS file to be integrated our report? There’s a few options that are typically at play:

  • Download the CSS file and keep it as external file. The advantage is that we have our .html and .css files neatly separated, and we have full control over both; the far more significant disadvantage is that now if we want to move our report around, we have to drag a bunch of .css files around with it.
  • Use a Content Delivery Network (CDN) copy of the CSS file. Most frameworks will offer a CDN link for their file: this is essentially a link to an efficient, readily available copy of the data. The advantage is that you can get a CSS up and going in your page just by dropping a single link in the <head> section of your .html, no mussin’ and fussin’ with local files. The disadvantage is that you don’t have control of the file, and an internet connection is required.
  • A slightly more complex option is to have local copies of the CSS file, and then write them into the .html file. This could probably be done relatively easily and sustainibly if we got clever with our templating. The advantage is that we’d have our report in a single file, and it wouldn’t require an internet connection to use; the disadvantage is that it is going to require a bit more effort to get set up. (This is commonly used approach when creating standalone versions of interactive pages. Write a Jupyter Notebook to .html, inspect the file, and you’ll find all the CSS and JavaScript magic packaged up in the <head> section.)

At this early stage of prototyping, I prefer to use CDNs if possible. The advantage of being able to swap CSS frameworks just by changing a single line of code and not having to bother with local files is worth the cost of not being able to play with and edit the framework. Optimisation (in the form of being able to automatically integrate the CSS into the .html file) can come a little later.

To start with, I’m going to use Milligram, a lightweight little framework. To get started using the CDN method, I simply follow the provided instructions to integrate into the CDN. Under our templates/report.html file, I’ll add the requisite links into the <head> section:

report.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>{{ title }}</title>
    <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Roboto:300,300italic,700,700italic">
    <link rel="stylesheet" href="//cdn.rawgit.com/necolas/normalize.css/master/normalize.css">
    <link rel="stylesheet" href="//cdn.rawgit.com/milligram/milligram/master/dist/milligram.min.css">
</head>
<!-- body section continues below... -->

Enter fullscreen mode Exit fullscreen mode

And all of a sudden, our plain, early 90’s looking webpage has been transformed into something a bit more pleasing to the eye:

But we note there’s something still not quite right here – primarily, why does the page (and the table in particular) always take up the whole width of the window? Why doesn’t this look right on mobile?

It turns out just adding a bunch of .css files isn’t quite enough. We need to make sure the layout of our .html pages match what’s expected by the .css layout.

HTML layouts

Like most topics in this space, the layout of your HTML page is a reasonably intuitive concept, while simultaneously being a problem that you spend years diving into. What makes it a bit more challenging is that despite there being a number of somewhat fragmentary explanations, I’ve struggled to find a simple and/or holistic to the field (although this explanation is currently my favourite gentle introduction, and the Mozilla guide appears to be quite thorough).

For brevity, I’m going to leave most of the further reading to you, the reader (sorry!), and instead focus on what Milligram expects.

If we inspect the code of the Milligram page, we’ll see that within the <body>, we can see the HTML of the site is structured roughly as:

<body>
    <main class="wrapper">
        <header class="header">
            <section class="container">
        <section class="container">

Enter fullscreen mode Exit fullscreen mode

Now, based on some of the readings in the above links, and assuming that this is the structure that Milligram is expecting, we can apply the same structure to our own report, giving us something like the following:

report.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <!-- Note the addition of the viewport! --> 
    <meta name="viewport" content="width=device-width, initial-scale=1.0, minimal-ui">
    <title>{{ title }}</title>
    <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Roboto:300,300italic,700,700italic">
    <link rel="stylesheet" href="//cdn.rawgit.com/necolas/normalize.css/master/normalize.css">
    <link rel="stylesheet" href="//cdn.rawgit.com/milligram/milligram/master/dist/milligram.min.css">
</head>
<body>
    <main class="wrapper">
        <header class="header">
            <section class="container">
                <h1>{{ title }}</h1>
                <p>This report was automatically generated.</p>
            </section>
        </header>
        {% for section in sections %}
        {{ section }}
        {% endfor %}
    </main>
</body>
</html>

Enter fullscreen mode Exit fullscreen mode

summary_section.html

<section class="container" id="summary">
    <h2>Quick summary</h2>
    <h3>Accuracy</h3>
    {% for model_results in model_results_list %}
    <p><em>{{ model_results.model_name }}</em> analysed <em>{{ model_results.number_of_images }} image(s)</em>, achieving an
        accuracy of <em>{{ "{:.2%}".format(model_results.accuracy) }}.</em></p>
    {% endfor %}
    <h3>Trouble spots</h3>
    {% for model_results in model_results_list %}
    <p><em>{{ model_results.model_name }}</em> misidentified <em>{{ model_results.number_misidentified }} image(s)</em>.</p>
    {% endfor %}
    <p><em>{{ number_misidentified }}</em> misidentified image(s) were common to all models.</p>
</section>

Enter fullscreen mode Exit fullscreen mode

table_section.html

<section class="container" id="{{ model }}">
    <h2>{{ model }} - Model Results</h2>
    <p>Results for each image as predicted by model <i>'{{ model }}'</i>, as captured in file <i>'{{ dataset }}'</i>.</p>
    {{ table }}
</section>

Enter fullscreen mode Exit fullscreen mode

We had already structured this report to be a collection of largely independent collection of sections – we were even using this terminology! – so it’s not a huge drama to add the <section> tags to the system.

Proof that this works

Run autoreporting.py and inspect the report – try it both at full-screen and simulating a mobile screen.

Progress! That’s the benefit of using a well-made responsive layout.

GitHub status

Oooft. That was a lot of background reading for not a huge amount of code. All the same, the project should look like this.


Step Eight – Making Our Tables Interactive

Okay. So now we have our tables, and they look pretty good – the challenge is that they’re not interactive. For instance, it would be wonderful to have the functionality to filter by a category, or drill down to a specific image.

Now, as with exploring CSS, we have a couple of options. Certainly, we can explore the option of creating all of this ourselves – there are plenty of examples around, and they’re not terribly difficult. But, if we’re being pragmatic (or pressed by business needs!) we can probably find a pre-built package of what we need.

After a bit of googling, I came across DataTables – this is a plugin for JQuery, a very common JavaScript framework. DataTables looks like it covers most of the functionality we need, and provides a wealth of extensions and plugins for any of the functionality we don’t have. All in all, a promising candidate.

Implementing DataTables

Fortunately, it turns out that implementing DataTables is relatively straight forward. From the front page of the DataTables site, we can see that the general principles are that we need to:

  • Although not spelled out explicitly – DataTables is a JQuery plugin – so first we need load the Jquery .js file.
  • We’ll then load the DataTables .js and .css files.
  • Finally, we call the DataTables function, pointing it at the HTML id of the table we want to add the functionality to.

That’s all relatively straightforward, with just some very solvable wrinkles:

  1. Our tables don’t have id tags to refer to.
  2. We need a way to call the DataTables function and point it at the id of the table, in a way that fits with our templating system.

Let’s address these one by one.

Importing the relevant files

Before we get to our wrinkles, let’s hit our basics. As we imported our files from CDNs previously, we’ll do the same for JQuery.

Let’s update the <head> section of templates/report.html and add the links:

report.html

<!-- More above! -->
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>{{ title }}</title>
    <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Roboto:300,300italic,700,700italic">
    <link rel="stylesheet" href="//cdn.rawgit.com/necolas/normalize.css/master/normalize.css">
    <link rel="stylesheet" href="//cdn.rawgit.com/milligram/milligram/master/dist/milligram.min.css">
    <link rel="stylesheet" href="//cdn.datatables.net/1.10.19/css/jquery.dataTables.min.css">
    <script src="https://code.jquery.com/jquery-3.3.1.slim.min.js"></script>
    <script src="//cdn.datatables.net/1.10.19/js/jquery.dataTables.min.js"></script>
</head>
<!-- More below! -->

Enter fullscreen mode Exit fullscreen mode

Adding id tags to the tables

We need to add an id tag to the <table> objects in our report. How can we do this?

Well, let’s work backwards from the templates/table_section.html file. Within that file, we note that we insert our fully-formed HTML tables via the {{ table }} insert.

The {{ table }} insert is generated in autoreporting.py when we call the get_results_df_as_html method of ModelResults class. This method takes the pandas DataFrame and converts it into a string of HTML using the DataFrame.to_html function.

If we inspect the docs for that function, we see that there’s an optional argument table_id. Ah yeah, cool man! If we pass the model name as that argument, the HTML table will be generated with the id that we want. The ModelResults class already has model_name as an attribute, so we can include that:

class ModelResults:

    # ... 
    def get_results_df_as_html(self):
        """ Return the results DataFrame as an HTML object. :return: String of HTML. """
        html = self.df_results.to_html(table_id=self.model_name)
        return html

Enter fullscreen mode Exit fullscreen mode

You can run autoreporting.py and inspect the tables that are generated to confirm that they do indeed have the model name as an id.

Easy! Just a matter of tracing it back from the end result of HTML to the actual source within the code.

Calling the DataTables function

We need to call the DataTables function listed above, pointing at the appropriate id as we’ve just generated. The actual code to call the function is pretty straightforward. The question is: where do we put it?

This challenge has a couple of bounding constraints on it.

  • We need to call the function to generate the DataTable using the model name – something like this:
$(document).ready( function () {
    $('#VGG19').DataTable();
} );

Enter fullscreen mode Exit fullscreen mode

  • Traditionally, JavaScript is placed either in the <head> section, although this is a somewhat controversial discussion. Note that this is tradition – it will actually run anywhere.

This is a challenge because of the structuring of our template. We want to place the JavaScript call for DataTables in <head>, which is in templates/report.html. Right now, when report.html is rendered under the main() function in autoreporting.py, it doesn’t know anything about the model names: the render() call only has arguments for title, the overall title of the report, and sections, a list of pre-rendered strings of HTML ready to be inserted into the document. We just need to modify autoreporting.py to pass in the model names and tweak report.html accordingly. We tweak our files thusly:

autoreporting.py

def main():

    # ... 
    # Production and write the report to file     f.write(base_template.render(
        title=title,
        sections=sections,
        model_results_list=[vgg19_results, mobilenet_results]
    ))

Enter fullscreen mode Exit fullscreen mode

report.html

<head>
    <!-- Lots of calls above... -->
    <script>
        {% for model_results in model_results_list %}
            $(document).ready(function() {
                $('#{{ model_results.model_name }}').DataTable();
            } );
        {% endfor %}
    </script>
</head>

Enter fullscreen mode Exit fullscreen mode

Bingo-bango: when we render report.html, the calls to render the DataTable functionality for each existing table is included. Nice!

And now, if we run autoreporting.py and inspect the output, we get something like this:

We can now order our tables, search for categories, filter by image names – we have some rich functionality available, with more available via extensions.

This is a huge advantage for a report! Imagine if you only wanted to check out the common themes of incorrect image categorisations, or rapidly narrow down on a specific image.

GitHub status

Your repo should look a little something like this.


Step Nine – Packaging it up

AKA Step the Last.

The good news: we have the functionality we want and need. We can take a .csv file or two and punch out an interactive report.

The bad news: we’ve hardcoded it to two files, VGG19_results.csv and MobileNet_results.csv, which limits the functionality.

The final step for this exploration is therefore to turn this hard-coded script into a tool that can be called from the command-line. We want to be able to call the report and an arbitrary number of .csv files and have the report spat-out. So if we called our script and specified the relevant .csv files, we’d get a report successfully written to the /outputs folder – looking a little like this on the command line:

$ python autoreporting.py VGG19_results.csv MobileNet_results.csv
Successfully wrote "report.html" to folder "outputs".

Enter fullscreen mode Exit fullscreen mode

This can be accomplished by utilising command line arguments – or, to put it rather simply, the commands that follow the call to Python. (In the example above, the first argument is autoreporting.py, our script. The second and third command line arguments are VGG19_results.csv and MobileNet_results.csv, respectively.) We have a couple of main ways we can approach this:

  • We can crunch the arguments manually, using sys.argv. There’s absolutely nothing wrong with this approach, sys.argv is really quite simple to use.
  • We can use a Parser like argparse, primarily to assist in generating help and error messages.

Because I’ve not used argparse before, I’m interested in giving it a go and testing it for these purposes.

Implementing argparse

Most everything we want to work with in argparse is handled within the main() call within autoreporting.py. To make this work, we’re going to:

  • Define and parse the arguments we’re interested in (specifically, filepaths to the results .csv files), using argparse;
  • Convert these filepaths into ModelResults objects that we can use to generate our reports;
  • Adapt our existing code to output reports using these ModelResults objects.

So, first of all, we need to make sure that argparse is imported. It’s been part of the standard library since Python 3.2.

autoreporting.py

import argparse
# ... 

Enter fullscreen mode Exit fullscreen mode

Next, within main(), we’ll define the parser – we’re saying how we want the command line arguments to be interpreted. This code is adapted pretty quick smart from the demo in the argparse docs. We actually only have one argument, by how argparse defines it – just the filepaths to our results .csv files. The key thing to note is that we set the nargs argument to "+", indicating that we can have a undefined number of arguments of this kind, but we do need at least one.

When we call parser.parse_args(), all the arguments are neatly returned as a Namespace object that makes the inputs very easy to access, as we’ll see in the following steps.


# ... 
def main():
    """ Entry point for the script. Render a template and write it to file. :return: """
    # Define and parse our arguments     parser = argparse.ArgumentParser(description="Convert results .csv files into an interactive report.")
    parser.add_argument(
        "results_filepaths",
        nargs="+",
        help="Path(s) to results file(s) with filename(s) '<model_name>_results.csv'."
    )
    args = parser.parse_args()

Enter fullscreen mode Exit fullscreen mode

From args, the Namespace object, we can pull out the filepaths and use them to generate ModelResults objects.

args.result_filepaths holds a list of our filepaths, which we have indicated should point at filenames in the format <model_name>_results.csv. We use the os.path module to manipulate this filepath, extract the model name, and generate the ModelResults object, adding it into a list called model_results as we go.

This filename manipulation can look a little tricky, but inspect the doccies of os.path and you’ll see it’s mostly clever string manipulation. os.path is full of very, very useful functions that can save you a lot of time with common path manipulations, and help your code to work cross-platform!

    # Create the model_results list, which holds the relevant information     model_results = []
    for results_filepath in args.results_filepaths:
        results_root_name = os.path.splitext(os.path.basename(results_filepath))[0]
        model_name = results_root_name.split("_results")[0]
        model_results.append(
            ModelResults(model_name, results_filepath))

Enter fullscreen mode Exit fullscreen mode

The logic for the set intersection – how we figure out which images are common across all results files – has be changed to account for the fact that we now have an arbitrary number of ModelResults objects in a list.

To make this work, we quickly extract the misidentified_images property of each object using a list comprehension, and then calculate the intersection of sets based on this resulting list. (Note that we have to use a leading asterix (*) when we call set.intersection() so that each member of the list gets passed in as an individual argument).

    # Create some more content to be published as part of this analysis     title = "Model Report"
    misidentified_images = [set(results.misidentified_images) for results in model_results]
    number_misidentified = len(set.intersection(*misidentified_images))

Enter fullscreen mode Exit fullscreen mode

Everything below this point is relatively consistent with our previous version, but now we’re taking advantage of the fact that we have our ModelResults objects already packed up into the model_results list.

    # Produce our section blocks     sections = list()
    sections.append(summary_section_template.render(
        model_results_list=model_results,
        number_misidentified=number_misidentified
    ))
    for model_result in model_results:
        sections.append(table_section_template.render(
            model=model_result.model_name,
            dataset=model_result.dataset,
            table=model_result.get_results_df_as_html())
        )

    # Produce and write the report to file     with open("outputs/report.html", "w") as f:
        f.write(base_template.render(
            title=title,
            sections=sections,
            model_results_list=model_results
        ))
    print('Successfully wrote "report.html" to folder "outputs".')

Enter fullscreen mode Exit fullscreen mode

Oooft! With all of the explanations, this looks a little complex. However, when you compare this code to the previous commit, you’ll see there’s not a great deal that’s actually significantly different here – we’ve really kept the core principles the same and just played with the packaging a bit.

GitHub status

Your project should look a little like this.


The End?

At the very start of the first post, I indicated that the goal of this project was to create an automatic HTML reporting tool, where the outcome was a single stand-alone HTML file, with info and interactivity.

Well, it’s done! We’ve got a tool that can accept an arbitrary number of standard results files, and spit out a report that crunches them into an interactive format.

Take a breather, push your chair away from your desk, and pat yourself on the back. We’ve done what we set out to do!

Does that mean we’re done? That depends, really.

What comes next?

At this point, we have a tool that works for a very narrow use-case, and assumes perfect inputs and perfect operation from the user. Now, if you’re using a tool like this just for yourself, and the inputs to the tool are quite consistent, then that could in fact be perfectly satisfactory – so no more work to be done, you’ve got something that’s fit for purpose.

But of course, there are any number of ways we can work to extend and harden this tool. As I was writing this tutorial, I made notes on some of these. Running losely from simpler to more complex, here are a few notes and ideas:

  • Could we add a default command-line argument with argparse that allows us to specify the title of the report?
  • Our .csv inputs need to be named in a perfectly consistent format. How can we restructure our inputs so that we can define the model name and not have it read from the filename?
  • How can we make this a command-line script that can be run anywhere on our machine – not just in the folder the script is in? If our data is generated and stored elsewhere, it would certainly be more useful to be able to call autoreport on the terminal, rather than trace back to where the script is stored, for instance.
  • In its current form, we’re analysing images – can we add functionality to show the images we’re analysing? This would be great for generating hypotheses for why a model failed.
  • Our reports need an internet connection each time they’re open. As part of the templating process, could we pull down the JavaScript and CSS files and embed them into our files?

This is a tiny sliver of the possible extensions – and this is not even to mention that there’s plenty of refactoring and tidying to be done across the project. The job of improving your work is never done!

A thankyou and a call to action

This has been the first tutorial of this scope I’ve ever written. I’ve had a lot of fun doing so, and to paraphrase Sigur Rós, this has been a good beginning.

But I’m really keen to hear what parts of this you, the tutorial-reader, enjoyed and what parts were challenging or obscure. Feel free to drop a comment or send me a message on what worked and what didn’t.

See you next time!

Automatic Reporting in Python (3 Part Series)

1 Automatic Reporting in Python – Part 1: From Planning to Hello World
2 Automatic Reporting in Python – Part 2: From Hello World to Real Insights
3 Automatic Reporting in Python – Part 3: Packaging It Up

原文链接:Automatic Reporting in Python – Part 3: Packaging It Up

© 版权声明
THE END
喜欢就支持一下吧
点赞9 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容