100 Bytes
This was a fun little brain-teaser presented to me by someone during their own application screening process. After thinking about it for a little bit, I started to realize what great challenge it is. With a few extra details and a few left out, the developer is forced to work within “real world” constraints and handle assumptions and unknowns. Having to read backward adds an interesting challenge, as does working efficiently with the creation of many files.
Requirements
As Quoted:
Incorporate:
- creativity
- Modules used
- Efficiency of the code during runtime.
Scan a 10M log file, and create chunks of files of 100 chars lines in each
chunks, from the 10M log file. 100th character can be end of a word or a
space. If it’s a broken word, add it to next file.
No additional details are provided.
Solution
#!/usr/bin/env python3
'''
This is a fun little brain teaser to find the most elegant way to
iteratively read chunks of data from an input file and then spit
them out into individual output (per-chunk) files, splitting lines
based on "word" termination.
'''
import os
import sys
import argparse
import textwrap
def main():
'''Primary script logic'''
opts = get_opts()
# Ensure output directory exists
os.makedirs(os.path.dirname(opts.output), exist_ok=True)
# Split file and write output files
with open(opts.input, 'r') as fh:
file_number = 0
for chunk in textwrap.TextWrapper(width=int(opts.max_size)).wrap(fh.read()):
write_file(f'{opts.output}.{file_number}', chunk)
file_number += 1
# Ensure all files have finished writing to disk
os.sync()
def write_file(path, s):
'''Write string to file path without blocking for sync.'''
_fh = os.open(path, os.O_CREAT | os.O_WRONLY)
os.write(_fh, bytes(s, encoding='utf8'))
os.close(_fh)
def get_opts():
'''Parse and return options'''
parser = argparse.ArgumentParser(
usage='scandir [-h] <optional arguments>',
formatter_class=lambda prog: argparse.HelpFormatter(
prog,
width=100))
parser.add_argument(
'-i', '--input',
dest='input',
action='store',
metavar='X',
help='Input file to read (default: input)',
default='input')
parser.add_argument(
'-o', '--output',
dest='output',
action='store',
metavar='X',
help='Output path/prefix (default: tmp/out)',
default='tmp/out')
parser.add_argument(
'-s', '--size',
dest='max_size',
action='store',
metavar='X',
help='Output path/prefix (default: output)',
default='100')
return parser.parse_args()
if __name__ == '__main__':
if sys.version_info[0] < 3:
sys.stderr.write('Python 3 or greater is required.\n')
sys.exit(1)
main()