-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File::open() on directories does not return Err(), leads to breakage with BufReader #64144
Comments
Your program is going crazy because you're collecting a vector of |
Perhaps, BufRead et. al needs to internally track if the last call returned EISDIR, and then return None() afterwards? Trying to continue calling read() after EISDIR is kinda nonsensical. |
( That is, EISDIR should be considered equivalent to "Error + EOF" , because a subsequent read() cannot return any more data ) |
I'm not sure we want to try to catalog the list of all possible terminal errors on every platform. |
Are there platforms where invoking read() on a filehandle opened for a directory does something useful? There may be other function calls useful on directories opened as filehandles, but read()? |
Yes. NetBSD supports reading directory entries this way. |
On linux there are many uses. It gives you a directory file descriptor (accessible via AsRawFd) which can be used for various other syscalls (not necessarily in the standard library) such as fiemap or the *at syscall family such as renameat.
Get the metadata of the file after opening it and check its type. This is often necessary anyway if you only want to operate on regular files and not block devices for example. |
Re-categorizing as a documentation issue. |
This is because BufReader/BufRead doesn't give us the interface we actually need, because those iterators emit a stream of Some(Err()) repeatedly when given a directory as the underlying fh. Subsequently, the iterator never terminates, and collect() then becomes an efficient implementation of a memory exhauster. Bug: rust-lang/rust#64144
This is because BufReader/BufRead doesn't give us the interface we actually need, because those iterators emit a stream of Some(Err()) repeatedly when given a directory as the underlying fh. Subsequently, the iterator never terminates, and collect() then becomes an efficient implementation of a memory exhauster. Bug: rust-lang/rust#64144
At the uutils project we just ran into this issue although triggered in a different way. Basically we have a couple instances of this pattern: reader.lines().filter_map(Result::ok) This seems quite innocent, but it causes an infinite loop. The correct code would be: reader.lines().map_while(Result::ok) I think documentation is not really enough in this case, because it's an easy mistake to make if the programmer is just looking at the types;
This seems like a good solution to me. Though I think it would be more consistent if it returned Relevant issues: uutils/coreutils#4539, uutils/coreutils#4573, uutils/coreutils#4574 |
@samueltardieu maybe it could be add as a clippy check ^^ (see @tertsdiepraam comment) |
I just found that the case I presented is essentially a duplicate of #34703 and #43504, which were closed with a reason of "this is expected behavior, you're misusing the API". However, I think that the mistake is too subtle and to easy to make to put the blame entirely on the programmer. Additionally, the fact that this problem showed up in uutils shows that it can cause problems in real code, written by competent Rustaceans. Edit: For a bit less anecdotal evidence, a search on GitHub also brings up many examples of the same pattern: https://github.com/search?q=.lines()+.filter_map(Result%3A%3Aok)&type=code&ref=advsearch |
Good idea: rust-lang/rust-clippy#10534 |
Flag `bufreader.lines().filter_map(Result::ok)` as suspicious This lint detects a problem that happened recently in https://github.com/uutils/coreutils and is described in rust-lang/rust#64144. changelog: [`lines_filter_map_ok`]: new lint
@samueltardieu That looks excellent to me. Thank you! |
Hello, Sorry to dig up this issue but it is referenced and for anyone that may fall into it. If you do want to get the BufReader for instance to count lines like this on a directory : use std::io::{BufRead,Read};
fn getsizeoffile<T: AsRef<std::path::Path>>(path: T) -> std::io::Result<(usize,usize)> {
let size2 = std::fs::read_to_string(&path).unwrap_or(String::new()).lines().count();
let lines = std::io::BufReader::new(File::open(&path)?).take(1000);
let size = lines.lines().count(); //WILL CRASH
Ok((size,size2))
}
#[test]
fn testsize() {
let (size0, size1) = getsizeoffile("/").unwrap();
assert_eq!(size0,size1);
} The first use std::io::{BufRead,Read};
fn getsizeoffile<T: AsRef<std::path::Path>>(path: T) -> std::io::Result<(usize,usize)> {
let size2 = std::fs::read_to_string(&path).unwrap_or(String::new()).lines().count();
let lines = std::io::BufReader::new(File::open(&path)?).take(1000);
let size = lines.lines().map_while(Result::ok).count(); //Count only if lines is okay (else count lines till there)
Ok((size,size2))
}
#[test]
fn testsize() {
let (size0, size1) = getsizeoffile("/").unwrap();
assert_eq!(size0,size1);
} It is already stated in the above discussion. As well the take limit does nothing here because it does not read anything. |
I am here after troubleshooting a bug that assumed that opening a directory is cross-platform compatible, but in-fact works on Unix platforms but not on Windows. See here: I think this is the fundamental problem i.e. the behavior is platform specific. I would appreciate if someone corrects me and points me to documentation on this. |
Opening a directory is possible on windows, but you need to pass an extra flag. The std::fs APIs are rather thin abstractions over platform APIs, so you'll have to consult the platform's documentation for the low level details. The behavior of special files (such as directories, pipes and others) are platform-specific. |
We do sometimes do things to abstract over platforms differences. Though for this change particularly it's arguable which is the "right" default behaviour and changing it on either platform may be unexpected. I think a way forward might be to add an option to |
So https://github.com/rust-lang/rust/blob/master/library/std/src/io/mod.rs has this definition of Iterator for Lines:
Now, setting aside the backwards compatibility guarantees for a bit, and ignoring the resulting type errors, what things would be worse if instead it was:
|
@hkBst even ignoring that API break, silently treating error as EOF is not in the spirit of the Rust standard library, which goes to great lengths to provide building blocks for software that robustly considers edge cases. I think this clippy lint warning is a much better solution to this issue— |
@scottlamb a clippy lint is definitely a good idea, though not everyone runs clippy as often as they should. I'd still be interested in learning about a specific use case for |
I tripped onto some odd behaviour.
I was under the assumption calling
File::open()
on a directory would Err(), and that I could use that to handle a user specifying a path that was not a file, ( instead of falling prey to race conditions by stat-ing first and then opening second ).However, ... bad things happened instead.
The last of these lines will run forever, with strace reporting:
Over and over again ad-infinitum with no end in sight.
Somehow I had a variation of this go crazy and eat 200% of my ram, but I'm having a hard time reproducing that exact case (Though it may have been related to the target-directory in question also being massive in my case).
Digging shows related bug #43504
The profound question that remains unanswered is: "Why is calling File::open on a directory fine?"
And the residual question is "How does one invoke File::open in such a way that it refuses to work on directories".
And importantly, how do I achieve that portably?
And if none of these concerns can be mitigated, the very least that could be done is have this giant foot-gun documented somewhere in
File::open
.( As far as I can divine, there's no useful behaviour to be found by allowing File::open to work on directories, as standard read() semantics simply don't work on directory filehandles, you have to use readdir() )
The text was updated successfully, but these errors were encountered: