The datasets are not even data in the first place.
If you want to understand estimations and projects, you have to study them. This is a social science project. It's not a math game. It's not a problem that you can solve by collecting oversimplified numbers from dubious sources based on incoherent definitions of the units of measurement.
If somebody reports that they spent x days developing a component, what does that mean? What were they doing during those days? How do we know what they were doing? We don't know unless a researcher was right there in the room taking notes in real-time. There are so many confounding variables involved, even if your data collection is perfect, that analysis of the data is a process fraught with opportunities for accidental overfitting or researcher bias.
The one bit of project estimation research that I actually believed was that project estimates are closely affected by timeline desires of major finders/"stakeholders".
Anecdata:
I once asked a VP of a major telecom why the project deadline was Sept 30 and not Oct 1 or Sept 28, and he replied "because my quarterly bonus depends on it"
If you want to understand estimations and projects, you have to study them. This is a social science project. It's not a math game. It's not a problem that you can solve by collecting oversimplified numbers from dubious sources based on incoherent definitions of the units of measurement.
If somebody reports that they spent x days developing a component, what does that mean? What were they doing during those days? How do we know what they were doing? We don't know unless a researcher was right there in the room taking notes in real-time. There are so many confounding variables involved, even if your data collection is perfect, that analysis of the data is a process fraught with opportunities for accidental overfitting or researcher bias.