本文是我在学习期间的笔记,看的书是《python语言及其应用》。

文件输入输出

  • 数据持久化最简单的类型是普通文件,有时也叫平面文件(flat file)
  • 它仅仅是在一个文件名下的字节流

open()

fileob = open(filename, mode)

mode的第一一个字母

  • r 读模式
  • w 写模式,如果文件不存在就创建,存在就重写新的内容
  • x 在文件不存在的情况下创建新文件并写文件
  • a 如果文件存在,就在文件末尾追加内容

mode的第二个字母

代表文件类型

  • t (或者省去)代表文本类型
  • b 代表二进制文件

最后要关闭文件

write()

  • write() 函数返回写入文件的字节数
poem = '''There was a young lady named Bright,
Whose speed was far faster than light;
She started one day
In a relative way,
And returned on the previous night.'''
print('len(poem)',len(poem))
fout = open('relativity','wt')
print(fout.write(poem))
fout.close()

print()

  • 也可以使用print() 函数
fout = open('relativity','wt')
print(poem, file=fout)
fout.close()
  • print() 会在每个参数后添加空格,在每行结束后添加换行。
  • sep 分割符:默认是一个空格 ’ ‘
  • end 结束字符:默认是一个换行’\n’
fout = open('relativity','wt')
print(poem, file=fout, sep='', end='')
fout.close()

分段写入

fout = open('relativity','wt')
size = len(poem)
offset = 0
chunk = 100
while True:
    if offset > size:
        break

    print(fout.write(poem[offset:offset+chunk]))
    offset += chunk

fout.close()

避免重写

fout = open('relativity','xt')
fout.write('stomp stomp stomp')
fout.close()

处理异常

try:
    fout = open('relativity','xt')
    fout.write('stomp stomp stomp')
except FileExistsError:
    print('relativity already exists! That was a close one.')

read()

  • 不带参数的read() 一次读取文件的所有内容
fin = open('relativity','rt')
poem = fin.read()
fin.close()
print(len(poem))

设置最大读入限制

  • 读到文件结尾,再调用 read() 函数会返回空字符串’’ ,
poem = ''
fin = open('relativity','rt')
chunk = 100
while True:
    fragment = fin.read(chunk)
    if not fragment:
        break

    poem += fragment
fin.close()
print(len(poem))

readline()

  • 对于文本文件,即使空行也有一个字符长度(换行字符’\n’)
poem = ''
fin = open('relativity','rt')
while True:
    line = fin.readline()
    if not line:
        break

    poem += line

fin.close()
print(len(poem))

使用迭代器

  • iterator
poem = ''
fin = open('relativity','rt')
for line in fin:
    poem += line

fin.close()
print(len(poem))

readlines()

  • 读入所有的行,并返回单行字符串的列表
  • print() 默认在每行结束加上换行
fin = open('relativity','rt')
lines = fin.readlines()
fin.close()
print(len(lines)," lines read")
for line in lines:
    print(line, end='')

print('newline')

for line in lines:
    print(line)

使用 write() 写二进制文件

bdata = bytes(range(0,256))
print(len(bdata))

fout = open('bfile','wb')
print(fout.write(bdata))
fout.close()

分块写二进制数据

fout = open('bfile','wb')
size = len(bdata)
offset = 0
chunk = 100
while True:
    if offset > size:
        break

    print(fout.write(bdata[offset:offset+chunk]))
    offset += chunk

fout.close()

使用 read() 读二进制文件

fin = open('bfile','rb')
bdata = fin.read()
print(len(bdata))
fin.close()

使用 with 自动关闭文件

with open('relativity','wt') as fout:
    print(fout.write(poem))

使用 seek() 改变函数位置

  • tell() 返回距离文件开始处的字节偏移量
  • seek() 允许跳转到文件其他字节偏移量的位置
  • seek() 同样返回当前的偏移量
fin = open('bfile','rb')
print(fin.tell())
print(fin.seek(255))
bdata = fin.read()
print(len(bdata))
print(bdata[0])

seek(offset, origin)

  • origin = 0, 从开头偏移offset个字节
  • origin = 1, 从当前位置处偏移offset个字节
  • origin = 2, 距离最后结尾处偏移offset个字节

这些值也在os模块中定义

import os
print('os.SEEK_SET :',os.SEEK_SET)
print('os.SEEK_CUR :',os.SEEK_CUR)
print('os.SEEK_END :',os.SEEK_END)
  • 用不同的方法读取最后一个字节
fin = open('bfile','rb')
print(fin.seek(-1, 2))
print(fin.tell())
bdata = fin.read()
print(len(bdata))
print(bdata[0])
  • 从当前位置寻找
fin = open('bfile','rb')
print(fin.seek(254,0))
print(fin.tell())
print(fin.seek(1,1))
print(fin.tell())
bdata = fin.read()
print(len(bdata))
print(bdata[0])
  • 最流行的编码格式(例如UTF-8)每个字符的字节数都不仅相同

结构化的文本文件

csv

import csv

villains = [
    ['Docter','No'],
    ['Rosa','Klebb'],
    ['Mister','Big'],
    ['Auric','Goldfinger'],
    ['Ernst','Blfeld'],
]
with open('villains','wt') as fout:
    csvout = csv.writer(fout)
    csvout.writerows(villains)

  • test
cat villains 
Docter No
Rosa Klebb
Mister Big
Auric Goldfinger
Ernst Blfeld
  • 重新读入
import csv
with open('villains','rt') as fin:
    cin = csv.reader(fin)
    willains = [row for row in cin]

print(willains)
  • 读入的数据也可以是字典集合
import csv
with open('villains','rt') as fin:
    cin = csv.DictReader(fin,fieldnames=['first','last'])
    villains = [row for row in cin]

print(villains)    

  • 用 DictWriter() 重写 CSV 文件
import csv
villains = [
    {'first': 'Doctor','last':'No'},
    {'first':'Rosa','last':'Klebb'},
    {'first':'Mister','last':'Big'},
]
with open('villains','wt') as fout:
    cout = csv.DictWriter(fout,['first','last'])
    cout.writeheader()
    cout.writerows(villains)
cat villains 
  • 再来读取文件
import csv
with open('villains','rt') as fin:
    cin = csv.DictReader(fin)
    villains = [row for row in cin]

print(villains)    

XML

HTML

JSON

yaml

待续。。。