Python

Welcome to Python Application Development

by Piyush Kumar and Biswas Parajuli

Use spacebar and arrow keys ( and ) to navigate. Please report issues here.

Acknowledgements

Please let me know if I missed you.

Topics Covered in this course

Getting Started

Before we start: Version Control

Read Getting Started - Git Basics

  • Use it for any code you write.
  • Lets you maintain snapshots and navigate your code history locally.
  • git is fully distributed. Enables:
    • Remote replication: think about local disk crashes
    • Collaborative software development
  • Known online hosting services
  • Popular alternative is Mercurial: hg
    • Search for: A friendly introduction to the Mercurial DVCS by Joel Spolsky

Before we start: Code Editors

Getting/Installing Python

  • For Windows, download .msi from python.org/downloads
  • For Mac
  • For Linux or "from source" install, first prepare by installing dependencies (Ubuntu 18.04)
    $ sudo apt install build-essential checkinstall openssl \
         libreadline-gplv2-dev libncursesw5-dev libssl-dev  \
         libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev
    
  • Then, to build and install locally from source
    $ tar -zxf Python-X.X.XXX.tgz
    $ cd Python-X.X.XXX
    $ ./configure --prefix=~/python --with-ensurepip=install
    $ make
    $ make install           # Installs at ~/python. So, update PATH.
    $ python3 --version      # Check version
    Python 3.X.X
    $ pip3 install requests  # Install 'requests', a Python module
    

Virtual Environment

  • Virtual environment (venv) for using multiple versions of Python3 on a single machine
  • Create a venv at ˜/.venvs/py3 for a version of Python3 installed at ˜/python:
     $ ~/python/bin/python3 -m venv ~/.venvs/py3    
    
  • Activate and deactivate the virtual environment on the command line:
    piyush@piyush:~$ source ~/.venvs/py3/bin/activate
    (py3) piyush@piyush:~$ which python3
    /home/piyush/.venvs/py3/bin/python3
    (py3) piyush@piyush:~$ deactivate
    piyush@piyush:~$ which python3
    /usr/bin/python3
    
  • On activating the venv, the shell prompt is prefixed with
    (py3)
  • If you want to run different versions of Python: pyenv

History

  • History of Python
  • Born: Dec 1990 to speed up development of sys-admin utilities
  • Became popular in ~2000
  • Borrows from: ABC, C, Algol-60, Pascal, Modula 2+/3, Haskell, Java, Lisp.
  • Supports OOP, functional programming, structured programming and others
  • Easy to teach, easy to learn, easy to use.
  • Cross Platform
  • Major Implementations: Cpython, PyPy, IronPython, Jython
  • Website: python.org. History Blog.

Design Philosophy

  • For "The Zen of Python" by Tim Peters:
    • Enter
      import this
      on the Python interpreter
  • Check out: "21 Years of Python" by Guido van Rossum, May 2011
  • In essence:
    • User-centric design by borrowing ideas freely from other languages
    • Focus on simplicity, explicitness and readability
    • Don't fret about perfection or performance (fix it later)
    • Cutting corners is ok

Uses

Drawbacks

  • Edge cases. Need to remember immutable from mutable. Side Effects. Not trivial to Debug at times.
  • Threading is an issue due to the Global Interpreter Lock (GIL)
  • Not ideal for places where speed/resource usage is paramount:
    • OS, Experimental Algorithmics, GPU computing
    • Even in these places, one can write mostly C code and call them from Python. Still great for prototyping.

Quiz

Which of the following statements is false?

Python is great for web applications.
This is true. DropBox and Youtube are examples.
CPython is an implementation of Python.
This is true.
Python is easy to learn.
This is true.
Python is bad for prototyping.
Congratulations. You are right. This statement is indeed false. Python is great for prototyping.

Visual Anatomy

  • Note for C, C++, Java programmers: no curly braces and semi-colons
  • C's
    main()
    if __name__ == "__main__":
    → the program entry point in Python?
slide_scripts/anatomy_python.html

Running A Python Program

  • Python program → collection of Python statements.
  • To run a Python program → execute statements sequentially.
  • Two ways:
    1. Interactively by invoking the interpreter: one statement at a time.
    2. By passing the file containing a collection of Python statements (module) as a commandline arg to the interpreter:
      $ python3 anatomy_python.py
      Enter 'y' to greet the world! y
      Hello world!
      
  • You can avoid typing
    python3
    on Linux/Mac terminal:
    • Add
      #!/usr/bin/env python3
      as the first line to
      anatomy_python.py
    • Run after making it executable:
      $ chmod +x anatomy_python.py
      $ ./anatomy_python.py
      Enter 'y' to greet the world! y
      Hello world!
      

Invoking the Interpreter

  • Enter
    python
    on the Linux (Windows) console:
     $ python
    
    slide_scripts/00invoke.html
  • Sequence of three 'greater than' symbols is the prompt which expects a Python statement
  • Also known as the Python Shell. Different from Bash or Windows Power Shell
  • Exit Interpreter:
    • Windows, OS X or Linux: Ctrl+D
    • Or you can just type quit() or exit() in the interpreter
  • For options: run python -h. E.g. -c single command, -d debug, -O optimized modes.
  • Check out: v, -V, -m.

Interactive Mode

  • You can see Read-Eval-Print-Loop (REPL) in action on the interpreter
slide_scripts/02interactive.html

Interactive Mode

  • Type Python statements only. Not the shell commands for OS.
  • Runs one statement at a time
  • Since the Interpreter directly displays the Python object,
    print()
    is not needed.
  • Note the presence/absence of quotes when printing strings
  • Compound statements on the Interpreter:
    • Don't indent at the prompt yet
    • Watch out for prompt change from >>> to ... for compound statements
    • Terminate compound statements with a blank line
    slide_scripts/01spam.html

Getting Help

On the interpreter, enter

help()
:

slide_scripts/03gettinghelp.html

Try: help(len), help(list), help(print)

Why use an IDE?

  • code navigation features
  • intuitive autocompletion
  • refactoring support
  • revision control integration
  • automatic builds
  • polished debugging
  • integration with unit testing tools
  • language-aware
  • ticket system integration
  • Free?

Jupyter Notebook

  • Client-server application to run or modify a special type of document called notebook document on a web browser.
  • Can be run both locally and remotely
  • Notebook document:
    • All at one place: executable code, output of the code, mark ups, figures, equations, paragraphs
    • Interactive: Individual elements can be edited and executed on the browser separately.
    • File extension: ipynb

Jupyter Notebook: Install and Run

  • Use pip3 to install on Linux or Mac:
    • $ pip3 install notebook --user
    • Option
      user
      installs locally. Update PATH if needed.
  • Run the client-server app:
    • Navigate to a working directory and run:
      $ jupyter notebook
    • Server runs and prints the URL which can be copy-pasted into the browser.
    • May automatically launch the browser with the URL.

Jupyter Notebook Dashboard

  • Directory like user interface on browser to create or open notebook documents
  • Dashboard → Files → New → Python 3 → Opens Untitled.ipynb in a new tab/window
jupyter notebook home jupyter new document

Kernel vs Cell

  • Kernel == Computational engine
    • Determines which interpreter to use: Python 3, Python 2, R
    • Every notebook has its own kernel
  • Cell == Container for Markdown formatted text or editable code
    • Front end for the kernel
    • Cells can be split, merged or deleted. See: Dashboard → "Edit" or "Insert"
    • Cells can be executed in sequential groups. See: Dashboard → "Cell"
  • Kernel preserves state of the cells
    • Names defined in previous cells are available in the cells created later.

Edit First Notebook Document

  • Select cell to edit → Choose Code or Markdown from Menu → Enter code or text
  • Hit SHIFT-ENTER to execute
  • Auto-save by default. Force a save with "Save and Checkpoint". No version control.
jupyter intro

Rename and Export Notebook Document

  • Lets write a program to get the current weather status of a city
  • To rename, click on Untitled to rename the notebook to
    current_weather
  • To export, select File → Download as → .pdf, .py, .html, .md, .tex
jupyter current weather

Running Python Programs

python compilation
  • Assume you have a file stock.py containing Python statements.
  • If you execute python stock.py on the command line:
    • The Python interpreter translates your code into a language that your computer can understand.

Python Programs

  • Programs are composed of statements.
    • Python file input ::= (NEWLINE | statement)*
  • Statements are either simple or compound.
    • Simple Statements = expression | import | pass | ...
      • Expressions create and process objects
      • Typically produce at least one object/value
      • Similar to languages such as C, Java
      • e.g. Assignment:
        • foo = 1 + 2
          is a statement where the name
          foo
          is assigned to the value of the expression
          1 + 2
          .
    • Compound Statements = if | for | while | ...
      • Generally of the form:
        HeaderLine:
            Nested Statement Block
        

Compound Statements

Statement Role Example
if/elif/else Select an action
slide_scripts/compound_if_else.html
for Iterate Sequence
slide_scripts/compound_for.html
while Loop
slide_scripts/compound_while.html

Simple Statements

Statement Role Example
pass Empty Placeholder
slide_scripts/simple_statement_pass.html
break Terminate Loop
slide_scripts/simple_statement_break.html
continue Jump to Beginning of Loop
slide_scripts/simple_statement_continue.html
assert Debug Check
slide_scripts/simple_statement_assert.html
del Delete an existing name
slide_scripts/simple_statement_del.html

Python Comments

  • Comments encode information for humans that cannot be expressed in code
    • Explain important points to make code easier to understand
    • Inform about references, dependencies, bugs
  • They can also be used to mask out experimental code sections
  • Some "rules":
    • Do not restate code in comments.
    • If the comment can be expressed in code, refactor.
    • Update comments when updating code.

Python Comments

  • Doc Strings, which are multiline and are within triple quotes lie at the top of a Python module, a class or a function and explain the overall functionality.
  • Block comments explain one or more lines of code elaborately.
  • Inline comments explain about single lines of code.
slide_scripts/comments_type.html

Python Comments

  • x = x + 1 # increment x
  • Comment every line
  • Draw boxes, ascii art, lines, ...

Python Variables

  • Names that identify values
  • Assignment: The name on the left-hand side now refers to the result of evaluating the right-hand side.
  • Names in Python:
    • Similar to C++:
      [_A-Za-z][_A-Za-z0-9]*
    • Also, Unicode+ characters can be used as names
  • No variable declaration or variable initialization in Python.
  • Dynamically Typed: A name can refer to any value at any time.
    slide_scripts/dynamic_type_same_name.html
  • Strongly Typed: On an object, only operations that are valid for its type are allowed.

Python Variables

  • Names or variables refer to values (also known as objects)
  • In Python, everything is an object: functions, classes, modules
  • Multiple names can refer to the same object
    slide_scripts/multiple_names_same_value.html
  • Objects live till they have references (names)

Python Variables

  • Name variables to reflect what they refer to.
  • Make variable names part of your documentation.
  • Programs are for people to read, not just computers.
  • Names to avoid:
    • Single character names
    • Dashes (-) in any package / module name
    • __double_leading_and_trailing_underscore__ (reserved)
    • Keywords:
      import keyword; print(keyword.kwlist)

Python Variables

slide_scripts/04variables.html

Variables are just names for an object:

  • name → Joe Mitchell
  • area → 12.56

Core Data Types

  • Numbers: int (1234), float (3.14), complex (3+4j), decimal
  • Strings: "spam", "\u0394"
  • Bytes: b"\xce\x94", "\u0394".encode()
  • Lists: [1, "two", [3, 4]]
  • Tuples: (4, 8, True, "hello")
  • Sets: {4, 8, True, "hello"}
  • Dictionaries: {"pi": 3.14, "e": 2.71}
  • Files: open("myfile.txt", "r")
  • Why use them?
    • They make programs easier to write, are efficient compared to custom types and are portable since they are part of the language.

Type of Types

  • Mutable types change value in place.
    slide_scripts/mutable_assignment.html
  • Immutable types don't -- Python raises error if you try:
    slide_scripts/05assignmentimmutable.html
  • Classification of Python types:
    • Immutable: Numbers, Strings, Tuples, Frozensets
    • Mutable: Lists, Dictionaries, Sets

Reference Semantics

  • Assignment manipulates references
  • slide_scripts/assignment_reference.html
immutable reference

Reference Semantics

  • Beware of references to mutables during assignment
slide_scripts/06referencecopy.html
List insertion

Quiz

What is the value of q after the following execution?

q=[1]
q.append(q)
print(q)
[1,1]
Nope!
[1, [...]]
Yes. Do you understand what happened? Python protects from inifite recursion when converting core data types to strings for printing. (For advanced users : list.__str__ is protected against infinite recursion. grep Py_ReprEnter in Python sources. Note that you can easily construct things like this: {1: [[...]], 2: [[...]], 3: {...}} )

Numbers

The interpreter acts as an informal calculator.

slide_scripts/07calculator.html

Numbers

  • Along with ints and floats, Python has other number types: rational, decimal, complex
slide_scripts/exotic_numbers.html

printing

  • Use print() function to produce a line of text or to display the value of variables or both
    slide_scripts/16print.html
  • By default, the line is sent to stdout. Override it by defining argument file
    slide_scripts/print_file.html

printing

  • By default, a newline is printed at the end. Define
    end
    to override it.
    slide_scripts/print_end.html
  • For custom formatting, use format-strings → strings preceded by character
    f
    slide_scripts/print_format_strings.html

User Input

  • input()
    gets input from the keyboard
  • Returns a string. Convert with
    int()
    or
    float()
    for numbers.
  • Not used in most Python code!
slide_scripts/input_returns_string.html

Control Flow

If control flow

Conditional

if -- Relational Operators

Meaning Math Python
Less than < <
Greater than > >
Less than or equal <=
Greater than or equal >=
Equals = ==
Not equal !=

Conditional

  • a boolean expression composed of
    and, or, not
    + relational operators.
  • if-statement evaluates the conditional in its header. If the conditional is True, it executes the nested statement block.
    if Conditional:
        Nested Statement Block
    
  • Numbers with value zero, empty objects and None evaluate to False.
slide_scripts/10pointinbox.html

pass statement

slide_scripts/09pass.html
  • Indentation = code blocks
  • pass = empty code block

Quiz

Are the following conditions equivalent?

  • if b >= a and b <= c:
  • if not(b < a or b > c):
True
Congratulations! Those conditions are actually equivalent. They both test if b lies inside the interval [a,c].
False
Look carefully!

Indentation

  • Code blocks in Python are defined using indentation
  • White spaces are used to define code blocks. Do not use Tabs.
  • Rules:
    • Use 4 spaces per level
    • < 80 chars per line
    • Use Python aware editor
    • Use pycodestyle to check for problems in your code
      $ pip3 install pycodestyle
      $ pycodestyle --help
      $ pycodestyle --first myprog.py
      
    • Use black to automatically fix trivial style problems.
      $ pip3 install black
      $ black --diff myprog.py   # Output diff without overwriting
      

Indentation

slide_scripts/11indent.html

while loops

slide_scripts/12while.html
  • from math import log, floor
    • imports the names log and floor from the Python stdlib math
    • log, floor can now be used in the program as if they were defined locally

for loops

slide_scripts/13for.html

for loops

slide_scripts/14for2.html

for loops

for name in iterable:
    statements
  • iterable
    produces objects in a sequence one at a time
  • name
    is reused and gets assigned to the newly produced object
  • statements
    block is executed once per object
  • There are different types of iterables:
    slide_scripts/15for3.html
  • range(0, 100000)
    creates a sequence object that evaluates not at once, but, lazily.

in operator

  • list, set, dict, tuple, str objects are containers whose elements reside in memory
  • in
    operator for containment/membership tests:
    slide_scripts/containment_tests_container.html
  • Complexity: \(O(n)\) for list, tuple, str and \(O(1)\) for set, dict
  • in
    operator for iteration:
    slide_scripts/iteration_container.html
  • Advanced: For custom class objects, implement
    __iter__
    and
    __contains__
    .

Functions

  • Give a name to a group of Python statements with keyword
    def
    • This code block accomplishes an atomic task (ideally).
    • Possible to feed in data as a set of arguments and return a result as output.
    • slide_scripts/function_example_square.html
  • Code reuse → access or call a function at multiple places repetitively.
  • Makes a program modular by breaking it down into smaller chunks.
    • Test individual functions → Test the whole program with precision
    • Recommended function length in lines of code < 25

Functions

slide_scripts/18doublefunc.html

Library Functions

  • Python has a large standard library with many useful functions.
  • Simply
    import
    and use. For example:
    math
    ,
    random
    slide_scripts/stdlib_example_math_random.html
  • Try:
    help(math)
    ,
    help(math.ceil)
  • How to know what functions are available in
    math
    library?
    • dir(math)
      gives you the list of names/functions that you can import/access.
    • Returns all methods/names/attributes defined in an object except those that start with a single underscore, "_"

Quiz

What will the following snippet print?

print(0O234)
0O234
Sorry.
234
Sorry.
156
You got it! Why? Hint: Try printing 0O10, 0O11

Data Types and Operations

Numbers

  • Builtin Numbers
    • Integers - int : a = 32, b = -23, c = 0x7fa8
    • Floating point numbers - float
    • Complex numbers (x=1+2j)
    • Booleans (True = 1, False = 0)
  • long and int have been merged to int in Python > 3.0. sys.maxint has been removed > 3.0 because there is no max anymore.
  • Try this:
    import sys
    print(sys.float_info)

Integers

  • Reference count = number of places that have a reference to an object
  • Includes references created by you + Python internals.
  • Thus, it's implementation dependent and may not match your Python.
    slide_scripts/290integers.html
Integers refcount example

Integers

  • Lower integers are referenced by more number of names (variables)
ref count

Integers

  • CPython implementation details:
    • #include "Python.h"
      provides all function, type and macro definitions.
    • PyObject, PyVarObject, PyCFunction
      are examples of common object structures in C that are used to define object types for Python.
    • All Python object types are extensions of type
      PyObject
      .
    • Python/C API functions usually return value of type
      PyObject*
      .
  • int and long are integrated into one type. Check
    longintrepr.h
    :
    struct _longobject {
      /* Macro: Brings in refcount/size/a pointer to a type object */
      PyObject_VAR_HEAD
      digit ob_digit[1];
    };
    
  • PyObject_VAR_HEAD
    expands to
    PyVarObject ob_base;
  • PyVarObject
    extends
    PyObject
    by adding the field
    ob_size
    to indicate length.

Integers: Operations

  • +,-,/,*
    have usual meanings.
    //
    is floor division.
    slide_scripts/08division.html
  • More Operations:
    slide_scripts/279moreoperations.html

Integers

slide_scripts/280integers.html

Floats

slide_scripts/20floats1.html

Boolean

slide_scripts/21bool.html
  • Note:
    • isinstance(obj, cls) returns True if obj is an instance of class cls.

None

  • The only built-in constant of NoneType to represent the absence of a value
  • None has a constant identity: interpreter creates a single None object.
  • Comparing names with
    None
    :
    • Check identity with the
      is
      operator:
      is
      or
      is not
    • x is None
      is equivalent to
      id(x) == id(None)
    • Though
      None
      evaluates to False, write
      if x is not None
      instead of
      if x
slide_scripts/22none.html

Strings

  • A String is a series of characters.
  • To represent strings, use code like ASCII, Win-1252...
  • Problem: Too many characters from too many languages
Skull and Bones

Solution: Unicode

Unicode

  • Provide a unique code point-a number, not a glyph-for each character.
  • More than Million code-points. Keeps ASCII as a subset
Devanagari unicode

Unicode

  • Any str object in Python 3.0+ is stored as Unicode: single, double or triple quoted strings.
  • By default, Python source code has UTF-8 encoding.
  • Change source code encoding with the following comment in the first/second line
    • # -*- coding: encoding_name -*-
  • In Python 3.3+, with or without 'u'
slide_scripts/23unicodepa.html

Encoding

  • A string exists as a sequence of bytes in memory, disk or network.
  • Mapping between bytes and Unicode code points:
    • An integer (code point) can be written in many different ways into bytes.
    • Example: Big Endian (09 2A) / Little Endian (2A 09).
  • Many encodings: UTF-8, UTF-16, UCS-2, UTF-32, ...

Python 3.x: str bytes bytearray

  • Characters are transmitted/stored as bytes.
  • Both bytes and bytearray can be decoded into a string object.
  • A string object can be encoded into bytes or a bytearray.
Strings and Bytes

Files with Unicode

Piyush in Hindi
slide_scripts/24unicodefiles.html

Encoding/Decoding Errors

  • Encoding/Decoding can use error handling policy:
    .encode([encoding='utf-8'], [errors='strict'])
    .decode([encoding='utf-8'], [errors='strict'])
               
  • errors:
    • strict : Default
    • ignore : Ignore the errors
    • replace: Replace with ? or another character

Unicode

  • IO is always bytes
  • Know your correct encoding
  • Unicode is everywhere: Web pages, databases, XML,...
  • Always write unicode aware code
  • UTF-8 is the king of all encodings

Strings

  • Quotes + Special characters within strings
    slide_scripts/25stringeasy.html
  • Multiline strings:
    slide_scripts/25multilinestring.html

Strings: Indexing

  • A string works like a read-only array with index accessible chars.
  • Immutable: can't be overwritten via indices.
  • Usage: name[index] where index is an integer. -ve integers allowed
slide_scripts/26stringindex.html

Strings

Indexing and Slicing

Strings and Bytes

String Slicing

Two types of slicing

  • S[start:end]
    -- result does not include
    end
  • S[start:end:step]
    -- Extract all items in
    S
    from offset
    start
    through
    end-1
    by
    step
    .
slide_scripts/28string_slicing.html

String Methods

  • An example to convert jabber id to email address
  • When a XMPP chat server talks to a client, it uses a jid.
  • slide_scripts/29stringjid.html
  • s.find(t)
    - first occurrence of t in s.
  • s * n
    - replicate s, n times.
  • How would you test if a string is a palindrome using only string slicing?

String Methods

  • Familiar methods:
    str(), len(s), s.upper(), s.lower(), s.find(t), s * n,
    . Also, indexing and slicing.
  • Other methods:
    ord(), chr(s), s.strip(), s.replace(), s.find(t)
    . Also, membership tests.
slide_scripts/30strmethods.html

String Methods

  • help(str)
    - Lots of methods
  • help(str.find)
    - Try it now !!
  • S.center(width [, fillchar] )
    -- Return S centered in a string of length width. Padding is done using the specified fill character (default is a space)
  • S.ljust(width[, fillchar])
    -- Return S left-justified in a string of length width. Padding is done using the specified fill character (default is a space).
  • S.split([sep [,maxsplit]])
    -- Return a "list" of the words in the string S, using sep as the delimiter string.
  • S.join(iterable)
    -- Return a string which is the concatenation of the strings in the iterable. The separator between elements is S.

String Methods

slide_scripts/31strmethods.html

Other useful string methods

  • count(), find(), index(), replace()
    slide_scripts/str_methods_count_find_index_replace.html
  • zfill(), isnumeric(), isalpha()
    slide_scripts/str_methods_zfill_isnumeric_isalpha.html

Changing Strings

  • Strings are Immutable == read only
  • Changing string data implies creation of new strings.
slide_scripts/32changingstrings.html

Quiz

What is the complexity of the following code snippet

fruits = ['apple', 'banana', 'orange', 'guava', 'pinaple']

line = fruits[0]
for namef in fruits[1:]:
    line += ', ' + namef
print(line)
O(n)
Wrong !! Think. Why is this slow?
O(n^2)
Correct! Repeatedly invoking += on strings is a quadratic operation. The faster way to do this is
','.join(fruits)

Changing Strings

Some string brain teasers to try on your own

  • Write a function that returns the reverse of a string.
  • Write a function that, given a string, removes all white spaces from the string
  • Not for the faint heart: Find the longest palindrome in a given string efficiently

Changing Strings

  • Take a ticker: e.g. "GLD"
  • Print its current stock value parsed from google finance webpage. The url to query is :
slide_scripts/33stockurl.html

Joining Strings

  • join()
    • Glue together a sequence of strings present in a list
    • Filler that goes between the strings can be specified
    • Memory efficient compared to "+"
slide_scripts/57format.html

Binary data

  • Binary data can exist in network responses or in files:
    slide_scripts/network_response_as_bytes.html
  • Bytes object: an immutable sequence of bytes
    • Syntax similar to
      str()
      but with a prefix
      b
      and created with
      bytes()
    slide_scripts/bytes_object_vs_str.html

struct: Format characters

  • Fix the rules to pack (create) or unpack (read) a bytes object with module
    struct
  • Encode the structure with a sequence of Format Characters
    • The number of bytes and their type
    • 4si
      means 4 chars followed by an int
      slide_scripts/struct_calcsize.html
Format Character Python type Size
c byte 1
? Bool 1
i integer 4
f float 4
d double 8
s bytes

struct: Example

slide_scripts/struct_pack_unpack.html

Lists

  • Ordered collection of arbitrary objects
    slide_scripts/34list.html
  • Variable-length, heterogeneous, and arbitrarily nestable
    slide_scripts/35list.html

Lists

  • Mutable Sequence
    slide_scripts/36list.html
  • Array of object references: Think of it as an array of pointers.
    • They use a lot more space than C Arrays
    • If you want something closer to C Arrays:
      • from array import array
      • Only homogenous types allowed
    • If you want to do math with arrays, look at NumPy

Lists

  • Slicing same as strings, but now mutable.
slide_scripts/37list.html

Lists

Operations you already know:

slide_scripts/38list.html

Exercise 3

  • Not to be graded.
  • Write a function to convert a Jabber ID to an email id:
  • Jabber ids look like: piyush@gmail.com/CDEFGB
  • But sometimes xmpp servers go wild and send: piyush@gmail.com as a jabber id

List Methods

  • L.count(x)
    : Returns number of items of L that are equal to x.
  • L.index(x)
    : Returns the index of the first occurrence of an item in L that is equal to x, or raises an exception if L has no such item.
  • L.append(x)
    : Appends item x to the end of L ; e.g., L[len(L):]=[x].
  • L.insert(i, x)
    : Inserts item x in L before the item at index i, moving following items of L (if any) "rightward" to make space (increases len(L) by one, does not replace any item, does not raise exceptions: acts just like L[i:i]=[x]).
  • L.remove(x)
    : Removes from L the first occurrence of an item in L that is equal to x, or raises an exception if L has no such item.

List Methods

  • L.pop([i])
    : Returns the value of the item at index i and removes it from L; if i is omitted, removes and returns the last item; raises an exception if L is empty or i is an invalid index in L.
  • L.reverse()
    : Reverses, in place, the items of L.
  • L.sort(cmp=cmp, key=None, reverse=False)
    : Sorts, in-place, the items of L, comparing items pairwise via the function passed as cmp (by default, the built-in function cmp). When argument key is not None, what gets compared for each item x is key(x), not x itself.
slide_scripts/list_methods_ex1.html

List Methods

  • Iteration using
    for
    works with a string as well as any other sequence.
slide_scripts/39iteration.html

List: Advanced Details

  • Both, indexable and mutable
  • Implemented as an array of pointers. Check Python Source Directory for:
    • ./Objects/listobject.c
    • ./Include/listobject.h
  • Fixed size pointers to objects. Overallocates initially, then uses realloc() and memcpy()
  • Fast
    append()
    ,
    pop()
    only on the right end: O(1) amortized
  • Costly
    insert()
    : amortized can be O(n)
  • Less space than sets/dictionaries (which have O(1) lookups)

List: Advanced Details

  • So, lists are efficient on the right end. Same efficiency on both ends?
  • collections.deque:
    • Works as both stack and queue
    • thread-safe, doubly-linked list of block nodes: ./Modules/_collectionsmodule.c
    • O(1): pop() and append() on the right end
    • O(1): amortized
      appendleft()
      ,
      popleft()
    • Indexing and insertion, as in a list, are not efficient
slide_scripts/deque_example.html

Exercise

  • Implement a function with usage: parse_tag(s, "tag", int)
  • Given a string containting <tag>20</tag>, the function will return 20 in this case.
  • Use this function to implement another function to parse : http://api.wunderground.com/auto/wui/geo/WXCurrentObXML/index.xml?query=TLH
  • And return a list with the following data: current temperature, visibility (miles), wind_degrees, wind_mph.
  • Not for the faint heart: Given a sorted list of integers, and an input number x, find two indices such that L[i]+L[j] == x, in O(n) time and O(1) space.

Files

  • Non-volatile collections of (usually large) data in secondary memory
  • Referenced with names called filenames.
  • Textual content is usually stored as strings terminated with newlines
  • Basic questions when we want to handle text files:
    • How to open files?
    • What's the role of newlines and how to remove them?
    • How can we convert strings to numbers?
  • Assume we have the file test.csv
    $ cat test.csv
    VZ, 110
    NYSE:GLD, 20
    NYSE:BAC, 100
    

Files

  • Open the file test.csv in read-mode and fetch one line at a time
    slide_scripts/40files.html
  • Open a new file test.csv in write-mode and write to it
    slide_scripts/41files.html

Files

  • Each open file consumes a file descriptor. OS limits max number of files per process.
  • So, apply
    f.close()
    to release the descriptor. But can be tedious with large number of files.
  • Use context manager for efficient file resource handling:
    • with statement:
      with open(fname) as fp:
      binds
      fp
      to its following code block
    • On exiting the block, the file is closed automatically.
    slide_scripts/files_with_context_manager.html

Files

  • Close is optional. Files are seekable -
    f.seek(offset)
  • Files are buffered -
    f.flush(), os.fsync(f.fileno())
  • Reading files:
    f_text = f.read()
    [newlines not interpreted]
  • Reading a file line by line :
    f_line = f.readline()
  • Read all lines in a List :
    f_lines = f.readlines()
  • Another way of doing iterations :
    slide_scripts/42files.html
  • Which one do you think is memory efficient?

File Location

  • Which directory is Python looking for the file?
slide_scripts/43files.html

Exercise 5

  • Write Python program to read portfolio.csv file and print out the current value of the portfolio. Use the following template
slide_scripts/44files.html

Dictionaries

  • Unordered collection accessed by a key (a hash table / 'associative arrays')
  • Mutable, heterogeneous and nestable
  • Creating Python dictionary:
    slide_scripts/45dict.html
  • Getting help for dictionaries:
    slide_scripts/47dict.html

Dictionaries

  • Some operations:
    slide_scripts/46dict.html

More dictionary operations

  • Get the keys, the values or the key-value pairs directly:
    slide_scripts/48dict.html

Creating a Dictionary

slide_scripts/create_dict.html

Looking up a Dictionary

slide_scripts/lookup_dict.html

Dictionary details

  • Keys should be immutable
    • Numbers, strings, tuples
    • Reason: hashing (fast lookups)
    • Requirement: hash must be defined.
  • Not lists or other dictionaries
  • No restrictions on values
  • Python's C Source: Objects/dictobject.c

Dictionary looping

  • Examples:
    slide_scripts/50dict.html

Exercise 6

  • JSON API of openbook library example (not to be graded)
slide_scripts/51dict.html

Tuples

  • Ordered collection accessed by offset: position matters!
  • Syntax: comma separated names or values enclosed within
    ()
    • Example:
      data = ("VZ", 110, 26.75)
  • Immutable: but the references in a tuple can point to mutable objects
  • Heterogeneous, Nestable
  • Arrays of object references
  • To get help, use
    help(())
    or
    dir(())
    .

Tuples

Operation Interpretation
()
An empty tuple
T = (0, )
One-item tuple (not an expression)
T = (0, "VZ", 3.1, 2)
A four-item tuple
T = ("out", ("in1", "in2"))
Nested tuples
T = tuple("spam")
Create tuple using iterable
T[i], T[i][j], T[i:j], len(T)
Index, recursive index, slice, length
T1 + T2, T * 3
Concatenate, Repeat
if x in T: print(x)
Containment, membership.
for x in T: print(x)
Iteration

Tuples

  • Why use tuples when lists already exist?
    • Efficiency
    • Lists are optimized for
      append()
    • Lists use more memory
    • Integrity: tuples are immutable
    • Tuples can be used as dictionary keys but not lists

Tuple Unpacking

  • Parallel assignment:
    slide_scripts/84assignment.html
  • Nested sequence assignment:
    slide_scripts/tuple_nested_seq_assignment.html
  • Assignment with selective unpacking using * operator:
    slide_scripts/tuple_selective_unpacking.html

Type Classification

Object Type Mutable ?
Numbers (all) No
Strings No
Lists Yes
Dictionaries Yes
Tuples No
Files N/A
Sets Yes
Bytes No
Bytearray Yes

Named Tuple

  • Assign meaningful names to positions of a tuple → increases program readability
  • Ability to access tuple fields with both position indices and names
  • Part of the
    collections
    module
  • Syntax:
    namedtuple(typename, field_names)
    where
    typename
    is a string and
    field_names
    is a list.
slide_scripts/named_tuple.html

Sets

  • Unordered collections with no duplicates in them. Doesn't support indexing.
  • Can only contain hashable/comparable types. But, can contain different types.
  • Sets are mutable. Immutable version of set →
    frozenset
  • Construct a set from a sequence of hashable types with the built-in
    set()
    :
slide_scripts/54sets.html

Set Operations

slide_scripts/55sets.html

Set Operations

  • Usual operations: max(), min(), len(), sum(), help(set), help(set.add), for x in S: print x
slide_scripts/56sets.html

Exercise 7

  • Write a function called remove_duplicates()
  • Preserve the order of the input.
  • Return a tuple as output.
  • Assume: for loop can be used on the input sequence.

What is left in types ?

  • String Formatting
  • Sequence Iteration + More on iterations
    • Enumerate, zip,
  • Comprehension: List/Set/Dictionary
  • Assignment
  • Ref counting
  • Deep/shallow copying
  • TypeChecking
  • First Class objects

String Formatting

  • Strings are either printed to stdout or saved to a file
  • Formatting a string means:
    • Building a string on the fly with content from existing names
    • Inserting special characters for better readability
  • Helps in:
    • parsing of log files
    • program debugging with well-formatted error messages
    • maintaining file format standards
  • Done with either
    format()
    or
    f-strings

f strings

  • New in Python 3.6
  • Strings pre-fixed with an
    f
  • Allows usage of variable names or even Python expressions inside a string
  • Curly brackets contain the names or the expressions
slide_scripts/302_f_strings.html

String Formatting

  • Create a string template with positional/keyword placeholders.
  • Update it with
    format().
slide_scripts/58format.html

String Formatting

  • Directly use module, dictionary or list in formatting
slide_scripts/59format.html

String Formatting (printf in C)

slide_scripts/60format.html
slide_scripts/61format.html

Format Codes

Char Formatting
s String
r Uses __str__ instead of __repr__. For more, see Chapter OOP.
c Character
d Decimal
o Octal
x Hexadecimal
f Floating Point : [-]i.ddddd
e Floating Point : [-]i.ddddde+-XX
g Good use of "E" notation

Format Codes

  • 0:<10d : Left justified, decimal in a 10-char field.
  • 0:10d: right justified. decimal in a 10-char field. Default
  • 0:10s: left justified. Default
  • 0:6.2f: Right justified float, width of 6 characters, 2 decimal digits.
  • 0:^80s: Centered string in 80 character long field.
  • 0!s:^80s: Convert the value into a string using str(), then format as the last line.
  • 0!r:^80s: Convert the value into a string using repr(), then format as the last line.

Format Review

  • {fieldname!conversionflag:formatspec}
  • filednames are either Simple or Compound
  • Simple Fieldnames:
    • Number or keyword naming an argument
    • If numbers, they must be valid base 10 integer identifying a positional argument.
    • A name is used to identify a keyword argument
  • Compound Fieldnames:
    • '.' operator -- 'getattr'
    • '[]' operator - 'getitem'
    • Arbitrary expressions are not allowed
  • conversion flag = r or s ('repr' or 'str')
  • formatspec : [[fill]align][sign][width][.precision][type]

Format Review

  • formatspec : [[fill]align][sign][width][.precision][type]
  • [] - Optional arguments
  • fill - Character used to pad field to minimum width
  • align flag - can be one of the following
Symbol Meaning
< Left-Aligned within available space (default)
> Right-Aligned
= Places padding after the sign (before digits) +000020
^ Centered within available space

Format Review

  • formatspec : [[fill]align][sign][width][.precision][type]
  • sign flag - can be one of the following
  • Symbol Meaning
    + Sign Mandatory
    - Sign only for -ve numbers
    ' ' Use space for +ve numbers
  • width - minimum field width
  • .precision - how many digits to display after the decimal point. For non-numeric types: Maximum field size

Exercise 8

  • Gmail exercise: Pull out gmail emails and format them with added information.
  • (Use gmail.py and read gmail.html)
  • Problem 2: Print a large integer with comma's. For example: 1234567 should be printed as 1,234,567

Sequence Iteration

  • You've already seen sequence slicing (in strings?).
  • Python Sequences: Strings, Lists, Tuples.
  • Slicing works the same on all.
  • Extended Slicing:
    • Sequence[start:end:step]
    • Step indicates jump and direction [+ → Right, - → Left]
    • End - not included as in simple slicing

Sequence Iteration

  • Extended Slicing:
slide_scripts/62seq.html

Sequence Iteration

  • You've already seen some simple functions on sequences: len, *, +, slicing, extended slicing, min(), max(), sum()
  • Simple iteration
  • What is the value of x now ?
  • slide_scripts/63seq.html

range()

  • range([start,] stop[, step = 1]) → list of integers
  • Use when iterating large ranges
  • range() computes values in each iteration lazily.
slide_scripts/64range.html

Sequence Built-ins

  • sorted() - Returns actual sequence and is not lazy
  • zip() - Does lazy evaluation
  • reversed()
  • enumerate()
  • Why isn't sorted() lazy? Because sorting is hard to do lazily?
  • An interesting project: Implement isorted() in C which would sort lazily…Benchmark!
slide_scripts/65seq.html

Zip

  • Combine multiple, parallel sequences to generate tuples:
    slide_scripts/66zip.html
  • Combine keys and values to create a dict:
    slide_scripts/72zipdict.html

Zip

  • Can take multiple sequences as input.
  • Stops with the shorter of the sequences input.
slide_scripts/67zip.html

Zip

  • Sequences of different length (and equivalent code):
    slide_scripts/68zip.html

Zip + Unpacking Sequences

slide_scripts/69zip.html

reversed()

  • Returns Iterator :
    slide_scripts/70rev.html
  • List can also be reversed with slicing
    slide_scripts/list_reversed_slicing.html
  • Which one's more efficient?
    $ python3 -m timeit "l = list(range(10000)); rl = list(reversed(l))"
    1000 loops, best of 3: 470 usec per loop
    $ python3 -m timeit "l = list(range(10000)); rl = l[::-1]"
    1000 loops, best of 3: 501 usec per loop
    

Iterable Unpacking

  • Selective and symmetric unpacking (see unpacking tuples with * operator)
  • Combination of multiple iterables becomes easy
  • Can be used inside tuples, lists, sets, dictionaries
slide_scripts/iterable_unpacking.html

Dictionary Unpacking

  • ** operator
    • Merge multiple dictionaries
      slide_scripts/dictionary_unpacking.html
    • keyword arguments for functions. More on this later in Chapter Functions.

Enumerate

  • Given an iterable, we may need to access and process the values along with their positions.
  • enumerate()
    generates a tuple
    (index, value)
    for each value in the iterable
  • Syntax:
    enumerate(iterable[, start])
    • By default, index starts from 0. Change it by passing in the argument
      start
      .
    slide_scripts/71enumerate.html

Enumerate

  • Counting sort with
    enumerate()
    slide_scripts/enumerate_in_counting_sort.html
  • In the list
    counts
    , its indices are the numbers of list
    nums
    and the values are their respective counts.

Exercise 9

  • Gram printing: Input a string. Remove all spaces/commas. Convert to lower case. Convert to n-grams. Sort and find frequency of 3-grams. Output top 10.

Comprehensions

  • Supported by:
    • List
    • Dictionary
    • Set
  • Creates a new sequence, applying an expression on each item in the current sequence.
slide_scripts/72listcomp.html

List comprehensions

  • [ expression for v in S [if condition] ]
  • Motivation: { x | x in S and x > 0} -- Mathematical Sets.
  • Filtering sequences - Think of comprehensions
slide_scripts/73listcomp.html

List comprehensions

  • Mathematics applications
slide_scripts/74listcomp.html

Set and Dictionary comprehensions

  • Similar to list comprehension. Master one.
slide_scripts/75listcomp.html

Comprehensions

  • Collect value of a field -
    [ p[age] for p in people]
  • Perform queries/filters
  • Quick math
  • Small file manipulation:
    [line.rstrip() for line in open('script.py')]
  • Permutation? - A more complicated example.
[x + y for x in list('pk') for y in list('py')] # ['pp','py','kp','ky']

List comprehension

  • Multiple if-conditions:
    slide_scripts/list_comp_complicated.html
  • Quite complicated! Remember "The Zen of Python".

Comprehensions

  • An inefficient Example:
    slide_scripts/76comp.html
  • What's the running time of this qsort?
  • How can we improve it?

List Comprehension: Cons

  • List comp on a large sequence → large new copy in memory
  • Reduce memory footprint with Generator Expressions:
    slide_scripts/generator_expressions_1.html

Exercise 10

  • 8 queens problem in 8 lines?
  • Why does it work?
slide_scripts/77comp.html

Objects

  • Summary:
    • Names are just pointers to objects.
    • Objects do have types.
    • type(object) - tells the type of object
    • Type name is usually a callable that converts an object into that type - Example int(object) vs < type 'int' >.
    • First Class Objects: Everything is an object in Python.

References vs Copies

  • Recap: References are just pointers
slide_scripts/78ref.html

References vs Copies

  • Assignment never creates copies of the values. Instead, it simply creates a reference.
  • X[1] = "one" # Assignments are pointer copies.
  • Since the references point to the same value, a change in the value affects all the references.
slide_scripts/79ref.html

Equality and References

  • The is operator tests object identity.
  • The == operator tests value equivalence
slide_scripts/80ref.html

Equality and Types

  • How do you check if an object is of a particular type? → Prefer isinstance() to type() (inheritance vs identity)
slide_scripts/81type.html

Shallow Copies

  • Using slicing L[:]
  • Dictionary and set copy() method. list() method for lists.
slide_scripts/82copy.html

Deep Copies

  • When you need to create a clone of an object (and all it points to - recursively).
slide_scripts/83copy.html

Summary

  • You are now ready to start looking at other people's Python codes…
  • You know all the basic principles
  • You know about objects

Exercise 11

  • On types, copies and counts
  • One simple trick: Use type lists to format/copy data.
  • Implement mergesort.