Skip to content

[Feature suggestion] Jagged tensors #367

@imh

Description

@imh

Working with jagged tensors is a real pain, especially when you have to translate back and forth from a fixed layout mentality to different jagged mentalities. Einops notation makes this easier. Einops support would make implementation easier too.

Example: sequences of sequences

I have data x of shape batch outer inner dim, or more precisely $[b, o_b, i_{b, o}, d]$ because the outer and inner sequence lengths vary. A nice way to do work with this sort of data in einops terms is to make a view of x as (b o) i d and get another tensor y of shape (b o) d, and then do x, y = inner_network(x, y); y = outer_network(y) back and forth.

This means viewing y as (b o) d to talk to the (b o) i d shaped x, and then viewing y as b o d to operate across the outer list level, and back and forth.

Juggling the offsets for this is a pain that makes x.reshape(2,-1,i,1).transpose(99, 42) look pleasant, while the einops model of b o i d -> (b o) i d and (b o) d -> b o d and b o d -> (b o) d is super clear.

Proposal

Einops already has pack and unpack which return a value and offsets. Let rearrange or a new repack or something take those offsets and a recipe, returning new offsets.

Implementation-wise, repack(ps, recipe) or repack(x, ps, recipe) is tedious, but fairly straightforward. I'll be doing it myself for pytorch but would love to put it in einops to share the benefit.

Bonus points

As a bonus, this model (offsets and values) is how arrow manages nested structures, so it would let einops users pass data to and from arrow without copying or being limited to fixed size columns.

Cheers and thanks for the great library :)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions