[1]:
import os
import numpy as np
import pandas as pd
import transportation_tutorials as tt

Understanding Errors

A variety of things can go wring when you run a piece of Python code: input files might be missing or corrupt, there might be typos or bugs in code you wrote or code provided by others, etc.

When Python encounters a problem it does not know how to manage on its own, it generally raises an exception. Exceptions can be basic errors or more complicated problems, and the message that comes along with an exception usually has a bunch of information that comes with it. For example, consider this error:

[2]:
for i in 1 to 5:
    print(i)
  File "<ipython-input-2-7864abb46b1a>", line 1
    for i in 1 to 5:
                ^
SyntaxError: invalid syntax

The SyntaxError tells you that the indicated bit of code isn’t valid for Python, and simply cannot be run. It helpfully also adds a carat marker pointing to the exact place where the problem was found. In this case, the problem is the “to” in the “for” loop, which is found in many other languages, but not in Python.

Obviously, even if the code is readable as valid Python code, there still may be errors.

[3]:
speeds = {
    'rural highway': 70,
    'urban highway': 55,
    'residential': 30,
}

for i in speed_limits:
    print(speed_limits[i])
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-a3dd58a8997a> in <module>
      5 }
      6
----> 7 for i in speed_limits:
      8     print(speed_limits[i])

NameError: name 'speed_limits' is not defined

Here, the code itself is valid, but a NameError occurs because there is an attempt to use a variable name that has not been defined previously. The error message itself is pretty self-explanatory. But consider this:

[4]:
road_types = ['rural highway', 'urban highway' 'residential']
[5]:
for i in road_types:
    print(speeds[i])
70
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-5-e2c143b87c0c> in <module>
      1 for i in road_types:
----> 2     print(speeds[i])

KeyError: 'urban highwayresidential'

A KeyError occurs when using a key to get a value from a mapping (i.e., a dictionary or a similar object), but the key cannot be found. Usually, the misbehaving key is also shown in the error message, as in this case, although the value of the key may be unexpected. Here, it appears to be the last two keys of the list mashed together. This happened due to a missing comma in the definition of the list earlier. When that line with the missing comma was read, it was interpreted as a valid Python instruction: a list with two items, the second item being two string value seperated only by whitespace, which implies they are to be concatenated. It is only when this value is ultimately used in the look that it Python discovers there is anything wrong.

To demonstrate a more complicated example, we can attempt to read a file that does not exists, which will raise an exception like this:

[6]:
pd.read_csv('path/to/non-existant/file.csv')
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-6-98079078e677> in <module>
----> 1 pd.read_csv('path/to/non-existant/file.csv')

~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    700                     skip_blank_lines=skip_blank_lines)
    701
--> 702         return _read(filepath_or_buffer, kwds)
    703
    704     parser_f.__name__ = name

~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    427
    428     # Create the parser.
--> 429     parser = TextFileReader(filepath_or_buffer, **kwds)
    430
    431     if chunksize or iterator:

~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    893             self.options['has_index_names'] = kwds['has_index_names']
    894
--> 895         self._make_engine(self.engine)
    896
    897     def close(self):

~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1120     def _make_engine(self, engine='c'):
   1121         if engine == 'c':
-> 1122             self._engine = CParserWrapper(self.f, **self.options)
   1123         else:
   1124             if engine == 'python':

~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1851         kwds['usecols'] = self.usecols
   1852
-> 1853         self._reader = parsers.TextReader(src, **kwds)
   1854         self.unnamed_cols = self._reader.unnamed_cols
   1855

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File b'path/to/non-existant/file.csv' does not exist: b'path/to/non-existant/file.csv'

There’s a lot of output here, but the last line of the output is pretty clear by itself: the file does not exist. As a general rule of thumb, when something you are running raises an exception, the message printed at the very bottom of the error output is the first place to look to try to find an explanation for what happened and how to fix it.

Sometimes, however, the explanation for the error is not quite a self-explanatory as the FileNotFoundError.

[7]:
tt.problematic()
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 15: invalid start byte

During handling of the above exception, another exception occurred:

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-7-1b294bfd1ac2> in <module>
----> 1 tt.problematic()

~/Git/python-for-transport/code/transportation_tutorials/data/__init__.py in problematic()
     46         # When there are various lines of code intervening,
     47         # you might not get to see the relevant problem in the traceback
---> 48         result = pandas.read_csv(filename)
     49         return result
     50

~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    700                     skip_blank_lines=skip_blank_lines)
    701
--> 702         return _read(filepath_or_buffer, kwds)
    703
    704     parser_f.__name__ = name

~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    433
    434     try:
--> 435         data = parser.read(nrows)
    436     finally:
    437         parser.close()

~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows)
   1137     def read(self, nrows=None):
   1138         nrows = _validate_integer('nrows', nrows)
-> 1139         ret = self._engine.read(nrows)
   1140
   1141         # May alter columns / col_dict

~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows)
   1993     def read(self, nrows=None):
   1994         try:
-> 1995             data = self._reader.read(nrows)
   1996         except StopIteration:
   1997             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 15: invalid start byte

In this case, the error report is less clear. The error type being raised is a UnicodeDecodeError, which gives us a hint of the problem: some kind of unicode text data is attempting (unsuccessfully) to be read from somewhere. But if you don’t know exactly what the problematic function is supposed to do, it might not be obvious what it wrong. It is in this situation that all the other data printed along with the error can be valuable. This other stuff is called a “traceback”, because it provides the entire path through the code, from the problematic function call, through every sub-function called, to the point where the error is encountered. Every function call is shown with both the name of the file and the name of the function.

For the most part, errors are unlikely to arise from bugs in major software packages, such as numpy and pandas. These packages are rigorously tested, and while it is possible to find a bug, it is generally unusual – it is much much more likely that bugs or errors will arise from application-specific code. Thus, it can be helpful to scan through all of the various files and functions, and look for items that are related to application-specific files. In this case, we skip over all the lines referencing pandas files, and focus on the other lines, which are found in the transportation_tutorials package:

.../transportation_tutorials/data/__init__.py in problematic()
     46         # When there are various lines of code intervening,
     47         # you might not get to see the relevant problem in the traceback
---> 48         result = pandas.read_csv(filename)
     49         return result
     50

By default in a Jupyter notebook, when the source code is written in Python, the traceback print out includes the offending line of code plus two lines before and after, to give some context. Sometimes that little snippet is enough to reveal the problem itself, but in this case those lines include some comments, which don’t really help us solve the problem.

If you want to investigate further, you can open the filename shown in a text editor such as Notepad++, and scroll to the indicated line number. In this file, if we did that we would see some more context that should help diagnose this problem:

.../transportation_tutorials/data/__init__.py in problematic()
     42
     43     def problematic():
     44         filename = data('THIS-FILE-IS-CORRUPT')
     45         import pandas
     46         # When there are various lines of code intervening,
     47         # you might not get to see the relevant problem in the traceback
---> 48         result = pandas.read_csv(filename)
     49         return result

Well, that’s helpful… it turns out we are loading a file that is intentionally corrupt, with junk data in part of the file, as might happen on a botched download from a remote server. If only diagnosing all errors were so easy! Unfortunately (or, fortunately, depending on your perspective), in real world applications, code probably won’t attempt to load a file that is intentionally corrupt and so clearly labelled as such.

How to Report a Problem

If you are unable to diagnose or solve a problem yourself, it may make sense to enlist some help from a co-worker or outside professional. When doing so, it is usually valuable not only to report what you were trying to do when a problem occurred, but also to send the entire traceback output from the problem as well. This offers others the chance to follow along through the code, and often problems can be diagnosed easily by looking at the complete traceback, particularly if they also have access to the same source code.

For more complicated problems, it may also be beneficial to share additional system information. This is particularly common and generally expected when you report issues with major packages such as numpy or pandas, but it can be useful for debugging other more localized problems as well. You can access some basic information about your system and your Anaconda Python installation by using the conda info command in a console or with the Anaconda Prompt on Windows.

(tt) C:\Users\cfinley>conda info

     active environment : tt
    active env location : C:\Users\cfinley\AppData\Local\Continuum\anaconda3\envs\tt
            shell level : 2
       user config file : C:\Users\cfinley\.condarc
 populated config files : C:\Users\cfinley\AppData\Local\Continuum\anaconda3\.condarc
                          C:\Users\cfinley\.condarc
          conda version : 4.6.9
    conda-build version : 3.12.0
         python version : 3.6.5.final.0
       base environment : C:\Users\cfinley\AppData\Local\Continuum\anaconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/win-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/msys2/win-64
                          https://repo.anaconda.com/pkgs/msys2/noarch
                          https://conda.anaconda.org/conda-forge/win-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/jpn/win-64
                          https://conda.anaconda.org/jpn/noarch
          package cache : C:\Users\cfinley\AppData\Local\Continuum\anaconda3\pkgs
                          C:\Users\cfinley\.conda\pkgs
                          C:\Users\cfinley\AppData\Local\conda\conda\pkgs
       envs directories : C:\Users\cfinley\AppData\Local\Continuum\anaconda3\envs
                          C:\Users\cfinley\.conda\envs
                          C:\Users\cfinley\AppData\Local\conda\conda\envs
               platform : win-64
             user-agent : conda/4.6.9 requests/2.18.4 CPython/3.6.5 Windows/10 Windows/10.0.17763
          administrator : False
             netrc file : None
           offline mode : False

Handling Errors

In simple code or analysis projects, most of the time you’ll just want to avoid having errors in your Python code. However, if you are writing Python functions that are shared with others or will be re-used in multiple places, it may be desirable or necessary to handle errors, instead of just avoiding them. To do so, you can use a try...except statement.

[10]:
try:
    table = pd.read_csv('path/to/non-existant/file.csv')
except:
    table = pd.DataFrame() # set to blank dataframe
print(table)
Empty DataFrame
Columns: []
Index: []

The try...except works like this: first, the code in the try block is run. If an exception is raised while running this code, execution immediately jumps to the start of the except block and continues. If no errors are raised, the code in the except block is ignored.

As shown above, this code will set the table variable to a blank dataframe for any kind of error. It is also possible (and often preferable) to be more discriminating in error processing, only catching certain types of errors. For example, we may only want to recover like this when the file is missing; if it is corrupt or something else is wrong, we want to know about it. In that case, we can catch only FileNotFoundError, which will work as desired for the missing file:

[11]:
try:
    table = pd.read_csv('path/to/non-existant/file.csv')
except FileNotFoundError:
    table = pd.DataFrame() # set to blank dataframe
print(table)
Empty DataFrame
Columns: []
Index: []

And raise the error for the corrupt file:

[12]:
try:
    table = pd.read_csv(tt.data('THIS-FILE-IS-CORRUPT'))
except FileNotFoundError:
    table = pd.DataFrame() # set to blank dataframe
print(table)
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 15: invalid start byte

During handling of the above exception, another exception occurred:

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-12-c39965b6a338> in <module>
      1 try:
----> 2     table = pd.read_csv(tt.data('THIS-FILE-IS-CORRUPT'))
      3 except FileNotFoundError:
      4     table = pd.DataFrame() # set to blank dataframe
      5 print(table)

~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    700                     skip_blank_lines=skip_blank_lines)
    701
--> 702         return _read(filepath_or_buffer, kwds)
    703
    704     parser_f.__name__ = name

~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    433
    434     try:
--> 435         data = parser.read(nrows)
    436     finally:
    437         parser.close()

~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows)
   1137     def read(self, nrows=None):
   1138         nrows = _validate_integer('nrows', nrows)
-> 1139         ret = self._engine.read(nrows)
   1140
   1141         # May alter columns / col_dict

~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows)
   1993     def read(self, nrows=None):
   1994         try:
-> 1995             data = self._reader.read(nrows)
   1996         except StopIteration:
   1997             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 15: invalid start byte

Alternatively, we can write different error handlers for the different kind of errors we expect to encounter:

[16]:
try:
    table = pd.read_csv(tt.data('THIS-FILE-IS-CORRUPT'))
except FileNotFoundError:
    table = pd.DataFrame() # set to blank dataframe
except UnicodeDecodeError:
    table = pd.DataFrame(['corrupt!'], columns=['data'])
print(table)
       data
0  corrupt!

There are a variety of other advanced techniques for error handling described in the official Python tutorial on this topic.

[ ]: