Even though there’s no way around handling JSON in web services nowadays, the developer experience between different programming languages varies significantly. I had the joy of fiddling around with Go’s particular implementation a while ago, which motivated me to do a follow-up post on embedding types in Go.
Rules of embeddings revisited
For implementing langsync, we needed to model abstraction layers for concepts like vector stores. Specifically, we wanted to support the following vector stores
type VectorStoreType string
const (
VectorStoreTypeWeaviate VectorStoreType = "weaviate"
VectorStoreTypePinecone VectorStoreType = "pinecone"
)
While the system should offer specific functions to connect to and perform operations on the underlying data stores, the operations exposed by a vector store are exactly the same (similarity search, inserting/deleting vectors, you get the gist). Modeling a generic VectorStore data type using embedded or anonymous struct fields could look something like the following
type VectorStoreBase struct {
StoreType VectorStoreType `json:"store_type"`
}
type PineconeVectorStore struct {
Config PineconeConfig `json:"config"`
}
type WeaviateVectorStore struct {
Config WeaviateConfig `json:"config"`
}
type VectorStore struct {
// see annotation above
VectorStoreBase
PineconeVectorStore
WeaviateVectorStore
}
This way, a VectorStore
always exposes the StoreType
field, while the configuration can be accessed by using the specific store (e.g. VectorStore.PineconeVectorStore.Config
). Conceptually, this is the same as writing
export enum VectorStoreType {
Pinecone = "pinecone",
Weaviate = "weaviate"
}
export interface VectorStoreBase {
store_type: VectorStoreType;
config: {};
}
export interface PineconeVectorStore extends VectorStoreBase {
store_type: VectorStoreType.Pinecone;
config: {
api_key: string;
environment: string;
index_name: string;
namespace: string;
};
}
export interface WeaviateVectorStore extends VectorStoreBase {
store_type: VectorStoreType.Weaviate;
config: {
host: string;
api_key: string;
};
}
export type VectorStore = PineconeVectorStore | WeaviateVectorStore;
Unfortunately, serializing and deserializing values in Go does not work as easily as running JSON.stringify
. To merge the embedded values correctly, we need to implement custom (un)marshaling logic.
Custom serialization (marshaling) logic
You may wonder why we need a custom marshaler in the first place. From the rules of embedding types in Go
Anonymous struct fields are usually marshaled as if their inner exported fields were fields in the outer struct, subject to the usual Go visibility rules amended as described in the next paragraph.
Unfortunately, it’s not as easy as that. As you may have spotted in the code above, each vector store has a dedicated configuration field with a type specific to the vector store. If we keep on reading the Go docs, we find a surprising paragraph
If there are multiple fields at the same level, and that level is the least nested(and would therefore be the nesting level selected by the usual Go rules), the following extra rules apply:
- Of those fields, if any are JSON-tagged, only tagged fields are considered, even if there are multiple untagged fields that would otherwise conflict.
- If there is exactly one field (tagged or not according to the first rule), that is selected.
- Otherwise, there are multiple fields, and all are ignored; no error occurs. Handling of anonymous struct fields is new in Go 1.1. Prior to Go 1.1, anonymous struct fields were ignored.
Oh. If we tried to marshal our vector store struct, the configuration would just be missing! That’s why we need to help Go out to marshal the right types.
To serialize a VectorStore
struct, we can implement the MarshalJSON
method on the non-pointer receiver. Why not on the pointer receiver? Because the application often passes around variables on the stack we don’t need to allocate memory on the heap for tiny structs. If we went ahead and implemented MarshalJSON
on the pointer receiver, the method would not be invoked when marshaling a regular struct value. By implementing the method on the VectorStore
struct receiver, it is called both for VectorStore
structs and pointers to structs.
// MarshalJSON marshals VectorStore
func (v VectorStore) MarshalJSON() ([]byte, error) {
switch v.VectorStoreBase.StoreType {
case VectorStoreTypePinecone:
type vectorStorePinecone struct {
VectorStoreBase
PineconeVectorStore
}
return json.Marshal(vectorStorePinecone{
VectorStoreBase: v.VectorStoreBase,
PineconeVectorStore: v.PineconeVectorStore,
})
case VectorStoreTypeWeaviate:
type vectorStoreWeaviate struct {
VectorStoreBase
WeaviateVectorStore
}
return json.Marshal(vectorStoreWeaviate{
VectorStoreBase: v.VectorStoreBase,
WeaviateVectorStore: v.WeaviateVectorStore,
})
default:
return nil, errors.New("unknown vector store type")
}
}
In this solution, we create a temporary struct that only contains the specific vector store types together with the base type. This way, we remove the other conflicting configuration fields, Go will marshal the data just as we expect.
Let’s move on to unmarshaling!
Custom deserialization (unmarshaling) logic
When we receive a serialized JSON string like
{
"store_type": "weaviate",
"config": { "host": "...", ... }
}
Go doesn’t just know which embedded struct to use. That’s why, during unmarshaling, we choose the final struct depending on the supplied store type value.
// UnmarshalJSON for VectorStore
func (v *VectorStore) UnmarshalJSON(data []byte) error {
err := json.Unmarshal(data, &v.VectorStoreBase)
if err != nil {
return err
}
if v.VectorStoreBase.StoreType == VectorStoreTypePinecone {
err := json.Unmarshal(data, &v.PineconeVectorStore)
if err != nil {
return err
}
return nil
} else if v.VectorStoreBase.StoreType == VectorStoreTypeWeaviate {
err := json.Unmarshal(data, &v.WeaviateVectorStore)
if err != nil {
return err
}
return nil
}
return nil
}
The easiest way of solving this is to unmarshal the base struct first and use the received store type in a second step to unmarshal the specific struct. This involves two unmarshal operations but doesn’t require allocating any additional memory.
While the solution looks straightforward, many pitfalls led to broken code and hours spent debugging. The biggest traps I ran into were 1) not realizing that Go would skip half of the VectorStore struct with multiple fields with the same name, 2) almost losing my sanity debugging why the pointer receiver method was never invoked (when marshaling a struct type), and 3) not testing the individual functions and believing everything was working fine when zero values made sense. These issues are easy to figure out but similarly easy to miss when time‘s running out. I recommend stepping out for a walk or doing something else when you’re stuck, your brain will probably surprise you with a solution.