I was wondering what would be the best way to use MatMiner with my own dataset. I’d like to use the featurization and plotting capabilities, but have a separate data set. Ideally, I’m looking for something where I can start from .cif files + properties and integrate that with MatMiner. Would the best way be to load those into pymatgen, build my own MongoDB database from that and use that to interface with MatMiner? Or is there a better way to go about this?
The primary object matminer works with is the pandas dataframe. You can use matminer without the dataframe but is a lot easier to just use it. You don’t need a MongoDB database, just the dataframe.
Here’s an example of how to go from a bunch of cif files and properties to dataframe:
import os
import pandas as pd
from pymatgen import Structure
properties = []
structures = []
for i, structure_file in enumerate(os.listdir(“path/to/cif/files”):
property = get_property_from_index(i)
structure = Structure.from_file(structure_file)
properties.append(property)
structures.append(structure)
df = pd.DataFrame({“some_property”: properties, “structure”: structures})
print(df) # make sure the dataframe appears like you intended
df.to_pickle("/path/where/u/want/to/save/ur/dataframe.p")
By the way, if your dataset is open source, published in a peer reviewed journal, and not already in matminer , please consider adding it to matminer via our dataset addition guide!
···
On Saturday, March 30, 2019 at 7:52:50 PM UTC-7, [email protected] wrote:
Hi all,
I was wondering what would be the best way to use MatMiner with my own dataset. I’d like to use the featurization and plotting capabilities, but have a separate data set. Ideally, I’m looking for something where I can start from .cif files + properties and integrate that with MatMiner. Would the best way be to load those into pymatgen, build my own MongoDB database from that and use that to interface with MatMiner? Or is there a better way to go about this?
The primary object matminer works with is the pandas dataframe. You can use matminer without the dataframe but is a lot easier to just use it. You don’t need a MongoDB database, just the dataframe.
Here’s an example of how to go from a bunch of cif files and properties to dataframe:
import os
import pandas as pd
from pymatgen import Structure
properties = []
structures = []
for i, structure_file in enumerate(os.listdir(“path/to/cif/files”):
property = get_property_from_index(i)
structure = Structure.from_file(structure_file)
properties.append(property)
structures.append(structure)
df = pd.DataFrame({“some_property”: properties, “structure”: structures})
print(df) # make sure the dataframe appears like you intended
df.to_pickle("/path/where/u/want/to/save/ur/dataframe.p")
By the way, if your dataset is open source, published in a peer reviewed journal, and not already in matminer , please consider adding it to matminer via our dataset addition guide!
On Saturday, March 30, 2019 at 7:52:50 PM UTC-7, [email protected] wrote:
Hi all,
I was wondering what would be the best way to use MatMiner with my own dataset. I’d like to use the featurization and plotting capabilities, but have a separate data set. Ideally, I’m looking for something where I can start from .cif files + properties and integrate that with MatMiner. Would the best way be to load those into pymatgen, build my own MongoDB database from that and use that to interface with MatMiner? Or is there a better way to go about this?
Thanks,
Simon
–
You received this message because you are subscribed to a topic in the Google Groups “matminer” group.