Quantcast

Towards a better compiler

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Towards a better compiler

cchampeau
Hi team!

I would like to setup some additional goals for the next release of Groovy. As you may know, Gradle 3.5, released tomorrow, will ship with a build cache [1]. But Groovy is causing us some troubles, because the output of compilation is not reproducible. In other words, for the same inputs, we can randomly get a different output. To be clear, while the output of Groovy is semantically correct, we cannot guarantee that for the same sources, the same bytecode, byte to byte, is going to be generated. This is a big problem for Gradle, because it affects the cache key, and a cache miss has terrible consequences: since we need to rebuild everything when a build script classpath changes, a change in the output of a build script bytecode means we need to invalidate a full build...

As an illustration, I fixed 2 issues this week [2] and [3]. But it's not enough. We should revise our usage of maps and sets, and use their linked counterparts when it makes sense. It, alone, cannot guarantee that we have reproducible builds. In particular, cross platform. There are hundreds of places where we use hash sets/maps, with objects that do not implement equals/hashcode, for example (and since those structures are mutable, we're very lucky because the default hashcode uses the system one, which is immutable).

Eventually, if you have read my blog post about performance of Gradle 3.4 [4], you would understand that the Java compiler does a much better job than we do as separating things required by the compiler from things required in user space. In particular, we at Groovy offer AST transformations, which is similar, but not equivalent, to what annotation processors are for javac. The problem is that Groovy only has a single compile classpath, which includes both the "compiler plugins" (AST transformations) and implementation dependencies of the project we compile. So we're effectively mixing things that shouldn't be mixed:

- annotations for AST transformations should be on the compile classpath
- implementation of the AST transformations should be on the AST transformations path (compiler classpath)

If we don't do this, we cannot be as smart as what we do in the Java world, and compute what is relevant in terms of ABI (application binary interface). So any change to the classpath, needs to be considered a breaking change and we need to recompile everything. This makes the implementation of a Groovy incremental compiler effectively impossible. Furthermore, it prevents the AST transform implementors to use the libraries they want as dependencies, without leaking them to the user compile classpath (imagine an AST xform which uses Guava 18, while the user wants Guava 17 on his classpath). In practice, the implementation dependencies of the AST xform are only required at compile time, not runtime, so they should be separate.

In short, I vote for the adoption of a new `-compilerpath` where we would put everything that affects the compiler itself, and live `--classpath` for the user space. Doing this would let tools like Gradle be much smarter about what they can do.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Towards a better compiler

Jochen Theodorou
On 09.04.2017 11:53, Cédric Champeau wrote:
[...]
> There
> are hundreds of places where we use hash sets/maps, with objects that do
> not implement equals/hashcode, for example (and since those structures
> are mutable, we're very lucky because the default hashcode uses the
> system one, which is immutable).

if we use a set without equals and hashcode, then I guess a list would
be better. For the maps it really depends.

>[...] In particular, we at Groovy offer AST
> transformations, which is similar, but not equivalent, to what
> annotation processors are for javac. The problem is that Groovy only has
> a single compile classpath, which includes both the "compiler plugins"
> (AST transformations) and implementation dependencies of the project we
> compile. So we're effectively mixing things that shouldn't be mixed:
>
> - annotations for AST transformations should be on the compile classpath
> - implementation of the AST transformations should be on the AST
> transformations path (compiler classpath)

internally this is separated with the transform loader. Just made no
sense so far to extend that to FileSystemCompiler. Is Gradle going to
use FileSystemcompiler now? Because before it did afaik not and could
have had this already. This is also the reason

> If we don't do this, we cannot be as smart as what we do in the Java
> world, and compute what is relevant in terms of ABI (application binary
> interface). So any change to the classpath, needs to be considered a
> breaking change and we need to recompile everything. This makes the
> implementation of a Groovy incremental compiler effectively impossible.

That means in an incremental compilation scenario the classpath changes?
The source set I expect to change, yes, but why the compile classpath?
Would be nice if you could describe this a bit in more detail.

> Furthermore, it prevents the AST transform implementors to use the
> libraries they want as dependencies, without leaking them to the user
> compile classpath (imagine an AST xform which uses Guava 18, while the
> user wants Guava 17 on his classpath). In practice, the implementation
> dependencies of the AST xform are only required at compile time, not
> runtime, so they should be separate.

here I can agree again.. the main problem I think is the groovy runtime
itself. Is a Closure on the compile classpath identical with one on the
compiler classpath? Does it make sense to have a transform let´s say use
Groovy 2.0 for the implementation on a Groovy 2.5 source?

> In short, I vote for the adoption of a new `-compilerpath` where we
> would put everything that affects the compiler itself, and live
> `--classpath` for the user space. Doing this would let tools like Gradle
> be much smarter about what they can do.
>
> [1] https://docs.gradle.org/3.5-rc-3/release-notes.html
> [2] https://issues.apache.org/jira/browse/GROOVY-8142
> [3] https://issues.apache.org/jira/browse/GROOVY-8148
> [4] https://blog.gradle.org/incremental-compiler-avoidance

sure... the identity question determines the complexity of the change
though. And how Gradle would use it as well of course

bye Jochen

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Towards a better compiler

Graeme Rocher-2
In reply to this post by cchampeau
Big +1 one for making the compiler reproducible.

I think the usage of Sets and HashMap has an impact on Groovydoc too
because Groovydoc generates different output each time it is run.

For example the "extends" and "implements" output in Groovydoc changes
the order of the classes each time it is run. This must be down to
internal use of unordered sets or hash maps.

Regarding classpath (compiler vs compile classpath), in many cases AST
transforms reference classes from libraries that are on the "compile"
classpath. How would you deal with this case? Have them in both
places?

Cheers

On Sun, Apr 9, 2017 at 11:53 AM, Cédric Champeau <[hidden email]> wrote:

> Hi team!
>
> I would like to setup some additional goals for the next release of Groovy.
> As you may know, Gradle 3.5, released tomorrow, will ship with a build cache
> [1]. But Groovy is causing us some troubles, because the output of
> compilation is not reproducible. In other words, for the same inputs, we can
> randomly get a different output. To be clear, while the output of Groovy is
> semantically correct, we cannot guarantee that for the same sources, the
> same bytecode, byte to byte, is going to be generated. This is a big problem
> for Gradle, because it affects the cache key, and a cache miss has terrible
> consequences: since we need to rebuild everything when a build script
> classpath changes, a change in the output of a build script bytecode means
> we need to invalidate a full build...
>
> As an illustration, I fixed 2 issues this week [2] and [3]. But it's not
> enough. We should revise our usage of maps and sets, and use their linked
> counterparts when it makes sense. It, alone, cannot guarantee that we have
> reproducible builds. In particular, cross platform. There are hundreds of
> places where we use hash sets/maps, with objects that do not implement
> equals/hashcode, for example (and since those structures are mutable, we're
> very lucky because the default hashcode uses the system one, which is
> immutable).
>
> Eventually, if you have read my blog post about performance of Gradle 3.4
> [4], you would understand that the Java compiler does a much better job than
> we do as separating things required by the compiler from things required in
> user space. In particular, we at Groovy offer AST transformations, which is
> similar, but not equivalent, to what annotation processors are for javac.
> The problem is that Groovy only has a single compile classpath, which
> includes both the "compiler plugins" (AST transformations) and
> implementation dependencies of the project we compile. So we're effectively
> mixing things that shouldn't be mixed:
>
> - annotations for AST transformations should be on the compile classpath
> - implementation of the AST transformations should be on the AST
> transformations path (compiler classpath)
>
> If we don't do this, we cannot be as smart as what we do in the Java world,
> and compute what is relevant in terms of ABI (application binary interface).
> So any change to the classpath, needs to be considered a breaking change and
> we need to recompile everything. This makes the implementation of a Groovy
> incremental compiler effectively impossible. Furthermore, it prevents the
> AST transform implementors to use the libraries they want as dependencies,
> without leaking them to the user compile classpath (imagine an AST xform
> which uses Guava 18, while the user wants Guava 17 on his classpath). In
> practice, the implementation dependencies of the AST xform are only required
> at compile time, not runtime, so they should be separate.
>
> In short, I vote for the adoption of a new `-compilerpath` where we would
> put everything that affects the compiler itself, and live `--classpath` for
> the user space. Doing this would let tools like Gradle be much smarter about
> what they can do.
>
> [1] https://docs.gradle.org/3.5-rc-3/release-notes.html
> [2] https://issues.apache.org/jira/browse/GROOVY-8142
> [3] https://issues.apache.org/jira/browse/GROOVY-8148
> [4] https://blog.gradle.org/incremental-compiler-avoidance



--
Graeme Rocher
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Towards a better compiler

Andres Almiray
I had a brief chat with Jochen a few days ago regarding this topic.

Until now the usage of a Set or some other data structure was not really that important.
If Groovy switches to a SortedSet then results may be more predictable, but, is it in the benefit of the majority?
What about opening the door for certain strategies to be injected from the outside, thus Gradle may inject certain customizations during the compiler process that make sense to the build tool but not for a Groovy developer compiling with the default Groovy compiler settings. This would require looking for places where custom strategies may be required (such as the particular collection to keep track of names discussed earlier), perhaps relying on ServiceLoader or some other mechanism to discover custom strategies or pick the default ones during compiler bootstrap.

Regarding the 2nd query on referenced classes by AST. Yes, you would be forced to define the references classes in both classpaths.

Cheers,
Andres

-------------------------------------------
Java Champion; Groovy Enthusiast
http://andresalmiray.com
http://www.linkedin.com/in/aalmiray
--
What goes up, must come down. Ask any system administrator.
There are 10 types of people in the world: Those who understand binary, and those who don't.
To understand recursion, we must first understand recursion.

On Fri, Apr 21, 2017 at 2:24 PM, Graeme Rocher <[hidden email]> wrote:
Big +1 one for making the compiler reproducible.

I think the usage of Sets and HashMap has an impact on Groovydoc too
because Groovydoc generates different output each time it is run.

For example the "extends" and "implements" output in Groovydoc changes
the order of the classes each time it is run. This must be down to
internal use of unordered sets or hash maps.

Regarding classpath (compiler vs compile classpath), in many cases AST
transforms reference classes from libraries that are on the "compile"
classpath. How would you deal with this case? Have them in both
places?

Cheers

On Sun, Apr 9, 2017 at 11:53 AM, Cédric Champeau <[hidden email]> wrote:
> Hi team!
>
> I would like to setup some additional goals for the next release of Groovy.
> As you may know, Gradle 3.5, released tomorrow, will ship with a build cache
> [1]. But Groovy is causing us some troubles, because the output of
> compilation is not reproducible. In other words, for the same inputs, we can
> randomly get a different output. To be clear, while the output of Groovy is
> semantically correct, we cannot guarantee that for the same sources, the
> same bytecode, byte to byte, is going to be generated. This is a big problem
> for Gradle, because it affects the cache key, and a cache miss has terrible
> consequences: since we need to rebuild everything when a build script
> classpath changes, a change in the output of a build script bytecode means
> we need to invalidate a full build...
>
> As an illustration, I fixed 2 issues this week [2] and [3]. But it's not
> enough. We should revise our usage of maps and sets, and use their linked
> counterparts when it makes sense. It, alone, cannot guarantee that we have
> reproducible builds. In particular, cross platform. There are hundreds of
> places where we use hash sets/maps, with objects that do not implement
> equals/hashcode, for example (and since those structures are mutable, we're
> very lucky because the default hashcode uses the system one, which is
> immutable).
>
> Eventually, if you have read my blog post about performance of Gradle 3.4
> [4], you would understand that the Java compiler does a much better job than
> we do as separating things required by the compiler from things required in
> user space. In particular, we at Groovy offer AST transformations, which is
> similar, but not equivalent, to what annotation processors are for javac.
> The problem is that Groovy only has a single compile classpath, which
> includes both the "compiler plugins" (AST transformations) and
> implementation dependencies of the project we compile. So we're effectively
> mixing things that shouldn't be mixed:
>
> - annotations for AST transformations should be on the compile classpath
> - implementation of the AST transformations should be on the AST
> transformations path (compiler classpath)
>
> If we don't do this, we cannot be as smart as what we do in the Java world,
> and compute what is relevant in terms of ABI (application binary interface).
> So any change to the classpath, needs to be considered a breaking change and
> we need to recompile everything. This makes the implementation of a Groovy
> incremental compiler effectively impossible. Furthermore, it prevents the
> AST transform implementors to use the libraries they want as dependencies,
> without leaking them to the user compile classpath (imagine an AST xform
> which uses Guava 18, while the user wants Guava 17 on his classpath). In
> practice, the implementation dependencies of the AST xform are only required
> at compile time, not runtime, so they should be separate.
>
> In short, I vote for the adoption of a new `-compilerpath` where we would
> put everything that affects the compiler itself, and live `--classpath` for
> the user space. Doing this would let tools like Gradle be much smarter about
> what they can do.
>
> [1] https://docs.gradle.org/3.5-rc-3/release-notes.html
> [2] https://issues.apache.org/jira/browse/GROOVY-8142
> [3] https://issues.apache.org/jira/browse/GROOVY-8148
> [4] https://blog.gradle.org/incremental-compiler-avoidance



--
Graeme Rocher

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Towards a better compiler

Jesper Steen Møller
Hi list

On 21 Apr 2017, at 14.33, Andres Almiray <[hidden email]> wrote:

I had a brief chat with Jochen a few days ago regarding this topic.

Until now the usage of a Set or some other data structure was not really that important.
If Groovy switches to a SortedSet then results may be more predictable, but, is it in the benefit of the majority?

For a reproducible build, LinkedHashSet would make more sense (as it preservers order, rather than sorting), and doesn't rely on comparability. For subtle dependencies like order of parent classes, this would be preferable.

Fixing the ordering can't possibly be worse than having it be random/unstable.

What about opening the door for certain strategies to be injected from the outside, thus Gradle may inject certain customizations during the compiler process that make sense to the build tool but not for a Groovy developer compiling with the default Groovy compiler settings. This would require looking for places where custom strategies may be required (such as the particular collection to keep track of names discussed earlier), perhaps relying on ServiceLoader or some other mechanism to discover custom strategies or pick the default ones during compiler bootstrap.

That sounds like a heavyweight solution compared to using LinkedHashSet but I might not grasp the full problem.

Also: How about the timestamps embedded into classes' initialization logic, such as  12: putstatic     #216                // Field __timeStamp__239_neverHappen1490860918788:J Are those still around? Won't they also prevent repeatable builds?

-Jesper


-------------------------------------------
Java Champion; Groovy Enthusiast
http://andresalmiray.com
http://www.linkedin.com/in/aalmiray
--
What goes up, must come down. Ask any system administrator.
There are 10 types of people in the world: Those who understand binary, and those who don't.
To understand recursion, we must first understand recursion.

On Fri, Apr 21, 2017 at 2:24 PM, Graeme Rocher <[hidden email]> wrote:
Big +1 one for making the compiler reproducible.

I think the usage of Sets and HashMap has an impact on Groovydoc too
because Groovydoc generates different output each time it is run.

For example the "extends" and "implements" output in Groovydoc changes
the order of the classes each time it is run. This must be down to
internal use of unordered sets or hash maps.

Regarding classpath (compiler vs compile classpath), in many cases AST
transforms reference classes from libraries that are on the "compile"
classpath. How would you deal with this case? Have them in both
places?

Cheers

On Sun, Apr 9, 2017 at 11:53 AM, Cédric Champeau <[hidden email]> wrote:
> Hi team!
>
> I would like to setup some additional goals for the next release of Groovy.
> As you may know, Gradle 3.5, released tomorrow, will ship with a build cache
> [1]. But Groovy is causing us some troubles, because the output of
> compilation is not reproducible. In other words, for the same inputs, we can
> randomly get a different output. To be clear, while the output of Groovy is
> semantically correct, we cannot guarantee that for the same sources, the
> same bytecode, byte to byte, is going to be generated. This is a big problem
> for Gradle, because it affects the cache key, and a cache miss has terrible
> consequences: since we need to rebuild everything when a build script
> classpath changes, a change in the output of a build script bytecode means
> we need to invalidate a full build...
>
> As an illustration, I fixed 2 issues this week [2] and [3]. But it's not
> enough. We should revise our usage of maps and sets, and use their linked
> counterparts when it makes sense. It, alone, cannot guarantee that we have
> reproducible builds. In particular, cross platform. There are hundreds of
> places where we use hash sets/maps, with objects that do not implement
> equals/hashcode, for example (and since those structures are mutable, we're
> very lucky because the default hashcode uses the system one, which is
> immutable).
>
> Eventually, if you have read my blog post about performance of Gradle 3.4
> [4], you would understand that the Java compiler does a much better job than
> we do as separating things required by the compiler from things required in
> user space. In particular, we at Groovy offer AST transformations, which is
> similar, but not equivalent, to what annotation processors are for javac.
> The problem is that Groovy only has a single compile classpath, which
> includes both the "compiler plugins" (AST transformations) and
> implementation dependencies of the project we compile. So we're effectively
> mixing things that shouldn't be mixed:
>
> - annotations for AST transformations should be on the compile classpath
> - implementation of the AST transformations should be on the AST
> transformations path (compiler classpath)
>
> If we don't do this, we cannot be as smart as what we do in the Java world,
> and compute what is relevant in terms of ABI (application binary interface).
> So any change to the classpath, needs to be considered a breaking change and
> we need to recompile everything. This makes the implementation of a Groovy
> incremental compiler effectively impossible. Furthermore, it prevents the
> AST transform implementors to use the libraries they want as dependencies,
> without leaking them to the user compile classpath (imagine an AST xform
> which uses Guava 18, while the user wants Guava 17 on his classpath). In
> practice, the implementation dependencies of the AST xform are only required
> at compile time, not runtime, so they should be separate.
>
> In short, I vote for the adoption of a new `-compilerpath` where we would
> put everything that affects the compiler itself, and live `--classpath` for
> the user space. Doing this would let tools like Gradle be much smarter about
> what they can do.
>
> [1] https://docs.gradle.org/3.5-rc-3/release-notes.html
> [2] https://issues.apache.org/jira/browse/GROOVY-8142
> [3] https://issues.apache.org/jira/browse/GROOVY-8148
> [4] https://blog.gradle.org/incremental-compiler-avoidance



--
Graeme Rocher


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Towards a better compiler

cchampeau
In reply to this post by Andres Almiray


2017-04-21 14:33 GMT+02:00 Andres Almiray <[hidden email]>:
I had a brief chat with Jochen a few days ago regarding this topic.

Until now the usage of a Set or some other data structure was not really that important.
If Groovy switches to a SortedSet then results may be more predictable, but, is it in the benefit of the majority?
What about opening the door for certain strategies to be injected from the outside, thus Gradle may inject certain customizations during the compiler process that make sense to the build tool but not for a Groovy developer compiling with the default Groovy compiler settings. This would require looking for places where custom strategies may be required (such as the particular collection to keep track of names discussed earlier), perhaps relying on ServiceLoader or some other mechanism to discover custom strategies or pick the default ones during compiler bootstrap.

I don't see the point honestly. The fact is that the compiler, for a specific set of sources + compile classpath, should always produce the same output. Especially on the same machine. If it doesn't, then the compiler is not predictable. It's a bug that needs to be fixed. It has nothing to do with Gradle, and *all* users would benefit from this. Gradle is an important use case because it currently defeats our caching, but it's not the only one. IDE indexing is another example. It's a serious issue, and we need to consider this a serious bug.

BTW, we don't need to _sort_. We need to make sure that for the same input, we have the same output. Especially, order of interfaces in declaration type matter (and they are reproducible today), We *must not* reorder them, or it would change semantics (typically for traits). I have fixed the bugs we, in the Gradle team, have identified, but my email was there to mention that we should take better care of this, because we do a pretty bad job at checking that the behavior of the compiler is deterministic.

Regarding AST transformations classpath, I had actually forgotten that Gradle integrates directly with the compiler, so can use the 2 different "classpath". But AFAIK, our CLI doesn't. This should be easy to fix. 

Regarding the 2nd query on referenced classes by AST. Yes, you would be forced to define the references classes in both classpaths.

Cheers,
Andres

-------------------------------------------
Java Champion; Groovy Enthusiast
http://andresalmiray.com
http://www.linkedin.com/in/aalmiray
--
What goes up, must come down. Ask any system administrator.
There are 10 types of people in the world: Those who understand binary, and those who don't.
To understand recursion, we must first understand recursion.

On Fri, Apr 21, 2017 at 2:24 PM, Graeme Rocher <[hidden email]> wrote:
Big +1 one for making the compiler reproducible.

I think the usage of Sets and HashMap has an impact on Groovydoc too
because Groovydoc generates different output each time it is run.

For example the "extends" and "implements" output in Groovydoc changes
the order of the classes each time it is run. This must be down to
internal use of unordered sets or hash maps.

Regarding classpath (compiler vs compile classpath), in many cases AST
transforms reference classes from libraries that are on the "compile"
classpath. How would you deal with this case? Have them in both
places?

Cheers

On Sun, Apr 9, 2017 at 11:53 AM, Cédric Champeau <[hidden email]> wrote:
> Hi team!
>
> I would like to setup some additional goals for the next release of Groovy.
> As you may know, Gradle 3.5, released tomorrow, will ship with a build cache
> [1]. But Groovy is causing us some troubles, because the output of
> compilation is not reproducible. In other words, for the same inputs, we can
> randomly get a different output. To be clear, while the output of Groovy is
> semantically correct, we cannot guarantee that for the same sources, the
> same bytecode, byte to byte, is going to be generated. This is a big problem
> for Gradle, because it affects the cache key, and a cache miss has terrible
> consequences: since we need to rebuild everything when a build script
> classpath changes, a change in the output of a build script bytecode means
> we need to invalidate a full build...
>
> As an illustration, I fixed 2 issues this week [2] and [3]. But it's not
> enough. We should revise our usage of maps and sets, and use their linked
> counterparts when it makes sense. It, alone, cannot guarantee that we have
> reproducible builds. In particular, cross platform. There are hundreds of
> places where we use hash sets/maps, with objects that do not implement
> equals/hashcode, for example (and since those structures are mutable, we're
> very lucky because the default hashcode uses the system one, which is
> immutable).
>
> Eventually, if you have read my blog post about performance of Gradle 3.4
> [4], you would understand that the Java compiler does a much better job than
> we do as separating things required by the compiler from things required in
> user space. In particular, we at Groovy offer AST transformations, which is
> similar, but not equivalent, to what annotation processors are for javac.
> The problem is that Groovy only has a single compile classpath, which
> includes both the "compiler plugins" (AST transformations) and
> implementation dependencies of the project we compile. So we're effectively
> mixing things that shouldn't be mixed:
>
> - annotations for AST transformations should be on the compile classpath
> - implementation of the AST transformations should be on the AST
> transformations path (compiler classpath)
>
> If we don't do this, we cannot be as smart as what we do in the Java world,
> and compute what is relevant in terms of ABI (application binary interface).
> So any change to the classpath, needs to be considered a breaking change and
> we need to recompile everything. This makes the implementation of a Groovy
> incremental compiler effectively impossible. Furthermore, it prevents the
> AST transform implementors to use the libraries they want as dependencies,
> without leaking them to the user compile classpath (imagine an AST xform
> which uses Guava 18, while the user wants Guava 17 on his classpath). In
> practice, the implementation dependencies of the AST xform are only required
> at compile time, not runtime, so they should be separate.
>
> In short, I vote for the adoption of a new `-compilerpath` where we would
> put everything that affects the compiler itself, and live `--classpath` for
> the user space. Doing this would let tools like Gradle be much smarter about
> what they can do.
>
> [1] https://docs.gradle.org/3.5-rc-3/release-notes.html
> [2] https://issues.apache.org/jira/browse/GROOVY-8142
> [3] https://issues.apache.org/jira/browse/GROOVY-8148
> [4] https://blog.gradle.org/incremental-compiler-avoidance



--
Graeme Rocher


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Towards a better compiler

cchampeau
In reply to this post by Graeme Rocher-2


2017-04-21 14:24 GMT+02:00 Graeme Rocher <[hidden email]>:
Big +1 one for making the compiler reproducible.

I think the usage of Sets and HashMap has an impact on Groovydoc too
because Groovydoc generates different output each time it is run.

For example the "extends" and "implements" output in Groovydoc changes
the order of the classes each time it is run. This must be down to
internal use of unordered sets or hash maps.

Regarding classpath (compiler vs compile classpath), in many cases AST
transforms reference classes from libraries that are on the "compile"
classpath. How would you deal with this case? Have them in both
places?

Yes, they should. Knowing that if you do so, it implies that a potential incremental compiler would be defeated. 

Cheers

On Sun, Apr 9, 2017 at 11:53 AM, Cédric Champeau <[hidden email]> wrote:
> Hi team!
>
> I would like to setup some additional goals for the next release of Groovy.
> As you may know, Gradle 3.5, released tomorrow, will ship with a build cache
> [1]. But Groovy is causing us some troubles, because the output of
> compilation is not reproducible. In other words, for the same inputs, we can
> randomly get a different output. To be clear, while the output of Groovy is
> semantically correct, we cannot guarantee that for the same sources, the
> same bytecode, byte to byte, is going to be generated. This is a big problem
> for Gradle, because it affects the cache key, and a cache miss has terrible
> consequences: since we need to rebuild everything when a build script
> classpath changes, a change in the output of a build script bytecode means
> we need to invalidate a full build...
>
> As an illustration, I fixed 2 issues this week [2] and [3]. But it's not
> enough. We should revise our usage of maps and sets, and use their linked
> counterparts when it makes sense. It, alone, cannot guarantee that we have
> reproducible builds. In particular, cross platform. There are hundreds of
> places where we use hash sets/maps, with objects that do not implement
> equals/hashcode, for example (and since those structures are mutable, we're
> very lucky because the default hashcode uses the system one, which is
> immutable).
>
> Eventually, if you have read my blog post about performance of Gradle 3.4
> [4], you would understand that the Java compiler does a much better job than
> we do as separating things required by the compiler from things required in
> user space. In particular, we at Groovy offer AST transformations, which is
> similar, but not equivalent, to what annotation processors are for javac.
> The problem is that Groovy only has a single compile classpath, which
> includes both the "compiler plugins" (AST transformations) and
> implementation dependencies of the project we compile. So we're effectively
> mixing things that shouldn't be mixed:
>
> - annotations for AST transformations should be on the compile classpath
> - implementation of the AST transformations should be on the AST
> transformations path (compiler classpath)
>
> If we don't do this, we cannot be as smart as what we do in the Java world,
> and compute what is relevant in terms of ABI (application binary interface).
> So any change to the classpath, needs to be considered a breaking change and
> we need to recompile everything. This makes the implementation of a Groovy
> incremental compiler effectively impossible. Furthermore, it prevents the
> AST transform implementors to use the libraries they want as dependencies,
> without leaking them to the user compile classpath (imagine an AST xform
> which uses Guava 18, while the user wants Guava 17 on his classpath). In
> practice, the implementation dependencies of the AST xform are only required
> at compile time, not runtime, so they should be separate.
>
> In short, I vote for the adoption of a new `-compilerpath` where we would
> put everything that affects the compiler itself, and live `--classpath` for
> the user space. Doing this would let tools like Gradle be much smarter about
> what they can do.
>
> [1] https://docs.gradle.org/3.5-rc-3/release-notes.html
> [2] https://issues.apache.org/jira/browse/GROOVY-8142
> [3] https://issues.apache.org/jira/browse/GROOVY-8148
> [4] https://blog.gradle.org/incremental-compiler-avoidance



--
Graeme Rocher

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Towards a better compiler

Jochen Theodorou
In reply to this post by Jesper Steen Møller
On 21.04.2017 14:54, Jesper Steen Møller wrote:
[....]
> Also: How about the timestamps embedded into classes' initialization
> logic, such as  12: putstatic     #216                // Field
> __timeStamp__239_neverHappen1490860918788:J Are those still around?
> Won't they also prevent repeatable builds?

the command line compiler should no longer produce classes with those.
only if you compile at runtime and with recompilation enabled they are added

bye Jochen

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Towards a better compiler

Jochen Theodorou
In reply to this post by cchampeau
On 21.04.2017 19:46, Cédric Champeau wrote:
[...]
> The fact is that the compiler, for a
> specific set of sources + compile classpath, should always produce the
> same output. Especially on the same machine. If it doesn't, then the
> compiler is not predictable. It's a bug that needs to be fixed. It has
> nothing to do with Gradle, and *all* users would benefit from this.

I am curious. How does a user exactly benefit from it outside of a build
tool like gradle?

Also I doubt we can be 100% reproducible as long as we use reflection
somewhere, simple because reflection is not stable. Only because of the
great contribution from intellij to be able read class files directly
without reflection in the compiler we have the ability for *most*
classes to have a stable result depending on the occurrence in bytecode.
But for for example the Groovy classes directly known by the compiler
like Closure, this won´t be the case.

100% stability is not achievable simply by using linked maps and sets.

> Gradle is an important use case because it currently defeats our
> caching, but it's not the only one. IDE indexing is another example.
> It's a serious issue, and we need to consider this a serious bug.

indexing? you mean idea? It does extensive indexing all the time... for
minutes... without groovy

> BTW, we don't need to _sort_. We need to make sure that for the same
> input, we have the same output.

once you include reflection here... Those cases you could solve by
sorting though.

> Especially, order of interfaces in
> declaration type matter (and they are reproducible today), We *must not*
> reorder them, or it would change semantics (typically for traits). I
> have fixed the bugs we, in the Gradle team, have identified, but my
> email was there to mention that we should take better care of this,
> because we do a pretty bad job at checking that the behavior of the
> compiler is deterministic.

well, that was no real option for a long time and no target till now.
But if you say your fixes are enough for gradle, then your tests should
ensure this stays. And then we don´t do a bad job at it anymore, do we?

> Regarding AST transformations classpath, I had actually forgotten that
> Gradle integrates directly with the compiler, so can use the 2 different
> "classpath". But AFAIK, our CLI doesn't. This should be easy to fix.

easy fix? depends... first of all... what will you gain from that? Since
I have no idea where you get an advantage from those things outside a
build tool that goes beyond something like make and ant, I fail to see
the advantage

Granted, something like groovydoc gets an advantage, but honestly I do
not see why there it should be done without sorting to achieve a
somewhat stable output. And of course that sorting can very well happen
in the doc tool

bye Jochen
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Towards a better compiler

Cédric Champeau


2017-04-21 22:02 GMT+02:00 Jochen Theodorou <[hidden email]>:
On 21.04.2017 19:46, Cédric Champeau wrote:
[...]
The fact is that the compiler, for a
specific set of sources + compile classpath, should always produce the
same output. Especially on the same machine. If it doesn't, then the
compiler is not predictable. It's a bug that needs to be fixed. It has
nothing to do with Gradle, and *all* users would benefit from this.

I am curious. How does a user exactly benefit from it outside of a build tool like gradle?

Any tool that performs bytecode analysis, or computes signatures is affected. So if the compiler is not reproducible, since your at the bottom of the execution, you affect all tools that are built on top. This affects Gradle, but also Grails, IntelliJ (because different bytecode forces reindexing), tools which perform bytecode decoration (intercepting Foo#foo() is not the same as intercepting Bar#foo()) so it also affects correctness. 

Also I doubt we can be 100% reproducible as long as we use reflection somewhere, simple because reflection is not stable. Only because of the great contribution from intellij to be able read class files directly without reflection in the compiler we have the ability for *most* classes to have a stable result depending on the occurrence in bytecode. But for for example the Groovy classes directly known by the compiler like Closure, this won´t be the case.

100% stability is not achievable simply by using linked maps and sets.

I agree, but I don't get the point of not trying to do better. It should be pretty obvious that for the same sources and same compile classpath, we should produce the same output. I don't get why we wouldn't try to fix this, even incrementally, as we found bugs. This is better. I agree that we cannot do 100% reproducibility today, because we don't implement equals/hashcode, which means we're system dependent (and probably JDK dependent too). And implementing those is doing to be hard, because our structures are not immutable (so mutating something that is already in the set/map is problematic). Anyway, I'm not asking for 100% reproducible today. I'm asking for every dev to realize that we have a problem that leads to non deterministic behavior, and that everyone should try to take that into consideration when writing code, and futhermore, tests. 


once you include reflection here... Those cases you could solve by sorting though.

Yes, we should avoid reflection as much as possible. I wonder if there are still lots of places in the compiler where we do this: reflection means loading the classes and too often initializing, which is probably no good in any case. 


Especially, order of interfaces in
declaration type matter (and they are reproducible today), We *must not*
reorder them, or it would change semantics (typically for traits). I
have fixed the bugs we, in the Gradle team, have identified, but my
email was there to mention that we should take better care of this,
because we do a pretty bad job at checking that the behavior of the
compiler is deterministic.

well, that was no real option for a long time and no target till now. But if you say your fixes are enough for gradle, then your tests should ensure this stays. And then we don´t do a bad job at it anymore, do we?

I have added tests for the cases I fixed, yes. They are not cross-platform, though. 


Regarding AST transformations classpath, I had actually forgotten that
Gradle integrates directly with the compiler, so can use the 2 different
"classpath". But AFAIK, our CLI doesn't. This should be easy to fix.

easy fix? depends... first of all... what will you gain from that? Since I have no idea where you get an advantage from those things outside a build tool that goes beyond something like make and ant, I fail to see the advantage

There would be an advantage for the Groovy compiler itself: today, if you use `groovyc`, you put everything on the compile classpath. We don't make the difference between classes that need to be found for implementation of the compiled classes, and those which are dependencies of AST transformations. The consequence is the same as the one people have with annotation processors when on the compile classpath instead of the annotation processor path: mixing classpath which have nothing to do together. In particular, say the user code requires Guava 19, but an AST transformation was built using Guava 17. Then you have a conflict, but the user shouldn't care about this, because Guava 17 is an implementation dependency of an AST transformation: nothing that should prevent them from using the class they wish, because once compiled, Guava 17 is not required anymore. It's about better modelling of dependencies, which has direct consequences on user pain. 

Granted, something like groovydoc gets an advantage, but honestly I do not see why there it should be done without sorting to achieve a somewhat stable output. And of course that sorting can very well happen in the doc tool

bye Jochen

12
Loading...