Recently i had the displeasure working with OpenCV for some video capture and image processing. You can read more about the actual project here: DIY surveillance: motion detection with OpenCV and Python
The library itself feels pretty nice, at least initially! As described in the blog post, it lets you process images and easily extract data from them, to achieve a variety of interesting things. I don't think that there are that many viable alternatives to it that you could use for something like it and Python itself seems really convenient to write scripts in. Surprisingly, even the performance seems good and the memory usage isn't too bad.
Of course, it all quickly falls to shambles.
For starters, when installing the pip dependencies, you'll actually need to wait quite a while. It seems like the actual low level stuff needs to be compiled and Python has the bindings to interface with it. This is probably why the runtime performance is pretty good, but it took me approximately 10 minutes to compile it:
One could argue that there aren't other ways to actually approach something like this, because attempting to write everything in Python would probably result in pretty poor performance or perhaps won't even provide access to most of the OS level stuff that would be needed. Then again, even pip itself complained about how slow this stuff was progressing:
Now, long compilation times i could live with, but the library itself outright refusing to work? Now that's more tricky, especially when it decides to freeze without giving you any information about what's wrong in particular!
Initially, i got problems, where the script would get to the following code:
video_out = cv2.VideoWriter(video_filename, fourcc, TARGET_FPS, (width, height))
There were many posts online about how the codec value fourcc
needs to be valid and supported by the OS (a value of -1
apparently should display all of the available ones, but this never worked for me), as well as the resolution values should match the recorded video, though since i was already retrieving the values, it should have worked.
And yet, it didn't. So, i tried to debug it, at least by running the script with the verbose logging argument:
python3 -vvvv webcam-motion-detection.py
This produced the following output:
Frankly, i'm not quite sure what i'm looking at, so it took me a whole bunch of trial and error, since the library itself wasn't actually interested in telling me what it didn't like.
In the end, it turned out to be a problem with the fourcc value, though none of the ones i could find on the internet worked, even X264 (H264) and after manually installing the code through apt. In the end, i managed to get it all working with MJPG, but since it uses JPEG images internally, it has pretty horrible filesizes which means that in the end i only could record videos with approximately 5 frames per second or risk wasting bunches of disk space.
Even if there are so many problems and things the library doesn't like, it should at least make an effort to tell me what's wrong!
I found that most tutorials on the Internet seemed to use the following function after capturing image data, as a means to wait for user input and see whether the script should be stopped:
_ = cv2.waitKey(1)
Now, in my case, i don't need to stop it based on user input, so initially i figured that i'd simply skip it. However, on Windows this means that the script will simply freeze and fail to work.
However, if this function is called on Debian, i get a different error:
So, it must be run on Windows, where the script will freeze without it, and must not be run on Linux, where it will break the script. This just feels so detached from the idea of cross-platform libraries and proper abstractions that i'm not sure whether i should laugh or cry.